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Abstract 



In this paper, we provide a novel construction of the linear-sized spectral sparsifiers of Bat¬ 
son, Spielman and Srivastava |BSS14|. While previous constructions required fl(n 4 ) running 
time [BSS14. Zoul2J, our sparsification routine can be implemented in almost-quadratic run¬ 
ning time 0(n 2+E ). 

The fundamental conceptual novelty of our work is the leveraging of a strong connection be¬ 
tween sparsification and a regret minimization problem over density matrices. This connection 
was known to provide an interpretation of the randomized sparsifiers of Spielman and Srivas¬ 
tava [SSlll via the application of matrix multiplicative weight updates (MWU) [CHS11, Visl4l. 
In this paper, we explain how matrix MWU naturally arises as an instance of the Follow-the- 
Regularized-Leader framework and generalize this approach to yield a larger class of updates. 
This new class allows us to accelerate the construction of linear-sized spectral sparsifiers, and 
give novel insights on the motivation behind Batson, Spielman and Srivastava [BSS14]. 


‘Part of this work was performed when the third author was an instructor at MIT Math, and when all the authors 
were visiting the Simons Institute at Berkeley. 




1 Introduction 

A powerful tool to handle large-scaled graphs is to compress them by reducing their sizes, while 
preserving properties of interest such as the size of cuts BK96, BK02] or the routability of certain 
flows [CLLM10]. This sparsification procedures also play an important role as fundamental prim¬ 
itives behind many fast graph algorithms [KLOS14. PS141. In this paper, we consider the strong 
notion of spectral sparsifier put forward by Spielman and Teng ST04. STlll: G' is (1 + e)-spectral 
approximate to G if G' is a subgraph of G with possibly reweighted edges, and for every x E R n , 

x T Lqx < x T Lqix < (1 + e)x T Lqx or equivalently Lq < Lq/ ^ (1 + e)Lg , 

where Lq and Lq / are respectively the graph Laplacian matrices of G and G'. 

The algorithm of Spielman and Srivastava [SSI 1] constructs (1 + e)-spectral sparsihers with 
0{n log n/e 2 ) edges in nearly linear time by randomly sampling edges proportionally to their effec¬ 
tive resistance. In a seminal paper, Batson, Spielman and Srivastava BSS141 give (1 + e)-spectral 
sparsihers with 0(n/e 2 ) edges, but their construction and subsequent algorithm by Zoul2] require 
0(mn 3 /e 2 ) and 0(mn 2 /e 2 + n 4 /e 4 ) time respectively. We shall refer to their analysis and algo¬ 
rithm the BSS for short. The main contribution of this paper is to give an improved construction 
of linear-sized spectral sparsihers that runs in almost-quadratic time. 

Theorem 1. For any even integer q > 2 and any e E (0, there is an algorithm that, for 

any weighted undirected graph G with n vertices and rn edges, with probability at least 1 — , 

constructs a (1+e)-spectral sparsifier G' that has at most 0(^/qn/s 2 ) edges in time 0(mn 1+1 /' ? /e 5 ). 

Since q can be chosen as a large constant and the graph can be preprocessed to reduce the 
number of edges to m = O(nlogn), the above running time is almost quadratic in terms of n. 

Graph sparsification is a special case of sparsifying sums of rank-1 PSD matrices (see BSS141 
and Appendix Bj. Our algorithm for Theorem 1 also applies to this more general problem with an 
almost cubic running time, which is stil an improvement over the previous quartic running time. 

Theorem 2. For any even integer q > 2 and any e E (0, there is an algorithm that, for 

any decomposition I = v i v T £ M nxn of rank-1 matrices, with probability at least 1 — 
constructs scalars Si > 0 with |{i : s* > 0}| < 0{y/qn/e 2 ) that satisfies I A s i v i v T — (1 + £ )I 
in time 0(n 3+1 / ? /e 5 + mn/e 4 ). 

The fundamental conceptual novelty of our work is the establishment of a deep connection 
between graph or matrix sparsihcations and a regret minimization problem over PSD matrices (see 
Section 1.1). This relation was known CHS11. Visl4] for the randomized sparsihers of Spielman 
and Srivastava [SSI 1], for which the underlying matrix concentration bound can be easily recovered 
as an application of the matrix version of Multiplicative Weight Updates (MWU) [AK07. Orelll, 
a standard online learning algorithm. However, it was not clear how this interpretation could be 
extended to BSS, despite a clear analogy was also noted by de Carli Silva, Harvey and Sato (see 
[CHS11. Section 8]). Both the MWU and the BSS rely on potential function arguments, where the 
potential is essentially a robust version to capture of the maximum and minimum graph eigenvalues. 
In this paper, we provide the missing piece of this interpretation: we consider a generalization of 
MWU to a larger class of updates, and show that the BSS can be recovered as an instance of this 
class. Beyond our faster implementation of sparsihcation, we believe that this interpretation is of 
independent interest and may be useful in other areas in which the argument of BSS has found 
application [Naol2l. 

We focus on updates coming from the follow-the-regularized-leader (FTRL) framework. The 
choice of regularizer in this framework fully determines the update strategy and the corresponding 
potential function. See for example the recent survey by Hazan Hazl2l. The standard MWU 
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argument can be recovered as an instance of FTRL, where the regularizer is chosen to be the 
entropy function. In contrast, we choose a different class of regularizers consisting of all 
semi-norms for q > 2, and provide corresponding regret bounds in Section 3. In Section 4 and 
Section 5. we show that the choice q = 2 recovers an algorithm which is somewhat similar to 
BSS, and produces linear-sized spectral sparsifiers. This algorithm can be implemented to run in 
a 0(mn 3 / 2 ) time. Finally, in Section 6. we consider regularizers corresponding to large, constant 
q > 2, which yield very different algorithms from BSS with almost quadratic running time. 

1.1 Regret Minimization 

In this subsection, we discuss our contribution on the problem of regret minimization in online 
linear optimization Hazl2], Our technical results apply to the more general case of online PSD 
linear optimization over the set of density matrices, but our key contributions are described more 
concisely in the scalar case. 

Let A n = {x E M n : x > 0 A l T x = 1} be the unit simplex in M n , and we call a vector in A n 
an action. A player is going to play T actions xq, ■ ■ ■ ,%t~i £ A n in a row; only after playing Xk, 
the player observes a feedback vector /*. E M n , which may depend on x^, and suffers the linear loss 
(fk,Xk). The regret minimization problem asks us to device a strategy for the player that minimizes 
the regret, i.e., difference between the total loss suffered by the player and the loss suffered by the 
a posteriori best fixed action u E A n : 

minimize max R(u), where R(u) = Y^k=o (fk, x k ~ u) . 

ueA n 1 

A well-known strategy for this problem is to update Xk in a multiplicative fashion: for each 
coordinate i E [n], define Xk+i,i to be proportional to Xk,i ■ exp - "'^ for some parameter a > 0. 
This strategy is known as the multiplicative weight update. Its classical analysis [AHK121 implies 

T—l T—l . 

VuE A n , R(u) = ^2(fk,X k - It) < ^ WfkWlo + • ( L1 ) 

k =0 k =0 

The first term on the righthand side contributes a regret of H/fcH^, that is paid at every iteration, 
and we call it the width term. The second term is a fixed start-up cost corresponding to ‘how long 
it takes the update to explore the whole A n \ and we call it the diameter term. If for all iterations 
k, H/fclloo is upper bounded by p, known as the width of the problem, the trade-off between the 
width and diameter terms can be be optimized by the choice of a > 0 to show that the total regret 
is at most 0(p^T log n). 

Optimization Interpretation. We take an optimization perspective to describe MWU and its 
generalizations by characterizing our strategies as instances of the follow-the-regularized-leader and 
mirror descent frameworks. Let w(-) be a strongly convex function over the simplex, known as the 
regularizer. The follow-the-regularized-leader strategy with parameter a > 0 can be described as a 
trade-off between minimizing the loss incurred so far and the value of the regularizer. 

FTRL: x k+ i = argmin { 10 ( 2 ) T a Ylj=a{fji z )} ■ (1-2) 

z6A„ 

Similarly, the mirror-descent strategy optimizes a trade-off 

MirrorDescent: start with x 0 = (£,..., ; x k+1 E- arg min{T4 fc (z) + a(f k , z)} , (1.3) 

z& An 

where V x (y ) = f w(y)—w(x) — (\7w(x),y—x) is the induced Bregman divergence. Under mild assump¬ 
tions (which are satisfied in this paper, see Appendix A), it is easy to check that MirrorDescent 
is equivalent to FTRL. We will therefore interchangeably use MirrorDescent and FTRL in the rest 
of the paper, because FTRL gives the cleaner description for the updates, while MirrorDescent 
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provides a simpler analysis. The MWU strategy is an instance of the two equivalent strategies 
above, with the choice of regularizer w(x) = JTxjlogXj — Xi, i.e. the (negative) entropy function. 

Previous Work. The MWU is a simple but extremely powerful algorithmic tool that has been 
repeatedly discovered in theory of computation, machine learning, optimization, and game theory 
(see for instance the survey AHK121 and the book [CL061). Since MWU has found numerous 
important applications in semidefinite programming [AK07, AHK05], constraint satisfaction prob¬ 
lem [StelO., maximum flow [CKM + lll, sparsest cut She09l, balanced separator OSV121, small 
set expansion BFK + ll], traveling salesman problem AGM + 10l, zero-sum games DDK11], and 
fractional packing problems [GK071. The analysis of follow-the-regularized-leader can be found in 
the surveys [Hazl2. Sha07], while that of the mirror descent appears in the the book BN13]. 

Beyond MWU. Historically, MWU has been extended at least from three orthogonal directions. 
In this paper, we pursue all these three directions simultaneously (see our summary in Table 1.) 


1. Prom vector to matrix. Instead of studying actions x in the forms of n-dimensional 

probability distributions, one can study density matrices X in A nxn , the set of PSD matrices 
whose trace equals to one. This is a generalization from a set of “experts” corresponding 
to {ei,...,e n } to all combinations of the form U e i where t is on the n-dimensional 

unit sphere § n Y Accordingly, each loss vector f k can be generalized to a symmetric matrix 
Ti. G M nxn , so the loss of any density matrix X becomes F k • X = Tr(F k X). (If X = vv T is 
of rank one, then Fk • X = v T F k v.) Among many applications, the matrix version of MWU 
has been used in designing algorithms for solving semidefinite programs AK07] and finding 
balanced separators [OSY 12] , and in the proof of QIP = PSPACE JJUW11]. 

2. Local norm convergence. The width term | f k 11 ^ in the regret upper bound (1.1) can be 
replaced with (\fk\,Xk) ■ ||/fc||oo- (Here, we have used \ fk\ to denote coordinate-wise absolute 
value of fk-) This technique is known as the local-norm technique because (\fk\,x k ) is a local 
way to measure the length of fk with respect to Xk- Since (\fk\,x k ) ■ ||/fc||oo is never larger 
than WfaW 2 ^, as well as Xk G A n , this new upper bound can only be smaller than the original. 
Indeed, this tighter bound has proved useful in the multi-arm bandit problem [AHR12], and 
in the solution of positive linear programs [AO 15]. It also underpins the negative-width 
technique of [AHK12]. 


3. Change of regularizer. If one replaces the entropy regularizer with the _i/^-regularizer 
w(x) = n=l x i 1 ,/f/ f° r an y 9 > 2, the corresponding update rule changes 


from 


Xk+i,i = exp E?=o«/i.i+c 


to 


Xk+l,i — ( TVn a fj,i + c ) 


\-q 


where in both cases c is the unique constant that ensures x k +i G A n . The FTRL framework is 
very powerful as the choice of regularizer w(x) completely determines both the form and the 
analysis of the update strategy. Ultimately, different regularizes achieve different trade-offs 
between the width and diameter terms in Equation (1.1), For instance, the £i/ 2 - r egularizer 
yields the following regret bound 


2 Jn 

Mu G A n , R(u) < 0(a) ■ Y^(| fk\,x k ) ■ max\f kiy /x^\ + -- . 

to 

The diameter term is now 2y / n, much worse than logn in the entropy case in (1.1). However, 
since (the local norm version of) the width term goes from (|/fc|,£fc) ■ ||/fc||oo to (|./).-], x k ) ■ 
maxj g [ n ] \ fk,iy/x k ,i\, the width term may become smaller.. This is exactly the case in the 
sparsification case, where the feedback vectors, corresponding to the edges added to the 
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Paper 

Allow Matrix? 

Allow Local Norm? 

Allow Non-Entropy Regularizer? 

[PST95. FS951 
|AHK05. AHK121 

no 

no 

no 

[AHR12. A015] 

no 

yes 

no 

fABLll, BC12] 

no 

yes 

yes 

[AK07, OSV121 

yes 

no 

no 

[HKS12] 

yes 

yes 

no 

[this paper] 

yes 

yes 

yes 


Table 1: Comparisons among prior results on the regret minimization problem. 

sparsifier, may be weighted up by a factor as large as n, so that we may have ||/fc||oo > n. In 
this scenario, the use of a more stongly-convex regularize!', such as £ 1 / 2 , allows us to measure 
the width in a more convenient local norm and yields the BSS linear-sized sparsifier (see 
Figure 1 on page 12 for a visual comparison of different regularizers). We point out that 
the fu-i/q-regularizers have also been used, albeit solely in the scalar case, by the machine 
learning community to obtain asymptotically optimal strategies for the multi-arm bandit 
problem ABL11, BC12], 

1.2 Extensions 

High Rank Sparsification. Our same algorithm of Theorem 1 and 2 also applies to sparsifying 
sums of PSD matrices, rather than just rank-1 PSD matrices. This recovers the same result of de 
Carli Silva, Harvey, and Sato CHS 11]. Such an extension has been shown important for problems 
such as finding hypergraph sparsifiers, finding sparse SDP solutions, and finding sparsifiers on 
subgraphs. However, as in the rank-1 case, the detailed running time of our algorithm has to be 
examined separately for each specific sparsification problem. 

As an example, given a weighted undirected graph G that is decomposed into edge-disjoint sub¬ 
graphs, the goal of linear-sized, subgraph sparsification is to construct a (l + 0(e))-spectral sparsifier 
G' to G, so that G' consists only of the reweighted versions of at most n/e 2 given subgraphs. Our 
same algorithm for Theorem 1 runs in time 0(mn 1+1 / q /e 5 ) for this problem. 

Weak Unweighted Graph Sparsification. Given k E [1, m/n ], consider the problem of finding 
a K-spectral sparsifier of G containing 0(m/n) distinct edges from E. without reweighting. This 
problem is very recently studied by Anderson, Gu and Melgaard AGM14], our regret minimization 
framework allows us to design a simple and almost-quadratic-time algorithm for this problem, 
improving from the quartic time complexity of AGM14]. 

2 Preliminaries 

Throughout this paper, for a cleaner representation that depends on the context, we interchangeably 
use A • Y = (X , Y) = Tr(AA) to denote the inner product between two symmetric matrices. 
If X is symmetric, we use e x to denote its matrix exponential and log A to denote its matrix 
logarithm, when X is PSD. If X is symmetric with eigendecomposition X = XiVivfwe denote 
by | X\ = Y17= 1 1 Xi\ y ivJ. For any symmetric X , we use || A|| spe to denote the spectral norm of X , and 
AmaxPO,AminPO to denote its largest and smallest eigenvalues. We define A raxn = {X E M. nxn : 
1 0, TrA = 1} to be the set of positive semidefinite (PSD) matrices with trace 1. This should be 

seen as the matrix generalization of the n-dimensional simplex A n = {1 £ 1“ : x > 0, \ T x = 1}. 
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Regularizes and Bregman Divergence. We are interested in two types of regularizes 
over A nxn , namely, w(X) = X • (log A — /), known as the entropy regularizer, and w(X) = 
— for some q > 1, which we call the ^i_i/ ? -regularizer. The corresponding Bregman 

divergences Vx(Y) '= w(Y) — w(X) — ( Vw(X),Y — X) are the following. 

entropy case: V X {Y) = Y • (log Y — log X) — I • (Y — X) , 

£i_l/„ case: V X (Y) = X~ l / q • Y + —hi 1 ' 1 * * / 9 -TYT 1 - 1 / 9 . 

<7—1 <7 _ 1 

Note that both regularizers above and their Bregman divergences are convex over the cone of 
PSD matrices, 1 We now state some classical properties of Bregman divergence. Their proofs are 
included in Appendix D for completeness. 

Lemma 2.1. The Bregman divergence of a convex differentiable function w(-) has the properties: 

• Non-negativity: V X {Y) > 0 for all X, Y > 0. 

• The “three-point equality”: (Vw(X) - Vw(Y),X - U) = V X (U) - V Y (U) + V Y {X). 

• Given and X = argmin^ gAj]xn Vfr(Z') as the Bregman projection, we have the “gen¬ 

eralized Pythagorean theorem” for all U E A rtXn : V^{U) > V X (U) + V^{X) > V x (U). 

3 Regret Minimization in Full Information 

In this section, we consider the following setting of the regret minimization problem, known as the 
full information setting. At each iteration k = 0,..., T— 1, the player chooses an action Xy. E A nxn , 
receives a symmetric loss matrix F\~ E M nxn and suffers a loss (F^, Xf). At this point, the player is 
allowed to observe the full matrix without any restriction. 

Again, the goal of the player is to minimize the regret with respect to any fixed matrix U E A nxn : 

m) = E T k =o( F k,x k -u). 

The best choice of U in hindsight can be taken as the rank-1 projection over a minimum eigenvector 
of ]>^fc=o Fk- As a result, the total loss for the best choice of U is A m i n ( -^fc) • 

Entropy Regularizer. If wf) is the entropy regularizer, then (1.2) can be explicitly written as 

MirrorDescent exp : A*, = ex p cI ~ a Yi,j=o F i , (3.1) 

where c E M is the unique constant that ensures TrA^ = 1. This is also known as the matrix 
multiplicative weight update method, and the following theorem gives its regret bound 2 

Theorem 3.1. In MirrorDescent exp , if the parameter a > 0 satisfies aFk F —I for all iterations 
k = 0,1,..., T — 1, then, for every U E A nxn , 

T—1 T 1 

R(U) = ^(F k ,X k -U)<aJ2{ A fc • |F*|) • ||F fc ||s P e + . 

k =0 fc=0 a 

We note that V Xo (U) < logn. 

Our proof of Theorem 3.1 uses a technique known as the tweaked version of mirror descent (see 
[Zin03. Rak09l). We define an intermediate point X k +i = argmin z> _Q {V Xk (Z) + a{F kl Z)} as the 

1 While this is easy to check by taking the second derivative for the entropy regularizer, it is less obvious for the 
li-i/q regularizer. The latter follows easily from Lieb’s concavity theorem [Lie73, Bha97]- 

2 The scalar version of this theorem was proved for instance in [AR09, Shall, A0151- A slightly different matrix 

version of this theorem was proved in [HKS12] (in particular, the authors of [HKS121 have required / y aFk k —I 

while in fact it suffices to only require aFk k —I- 
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minimizer over Z F 0, rather than Z E A nxn as in (1.3). Accordingly, the actual point X k+ \ equals 
to arg mm Ze /^ nxn {V^ k i (Z)}, the Bregman projection of X k+ \ back to the hyperplane Tr Z = 1. 
This two-step interpretation of mirror descent gives a very clean proof to our regret bound, and we 
defer this full proof to Appendix E, 

f-i—i/q regularize! - . If w(-) is the regularizer, then (1.2) can be explicitly written as 

MirrorDescent^_ 1/q : X k = (cl + ; (3-2) 


where c E M is the unique constant that ensures cl + a YljZo Fj F 0 and TrA*, = 1. 

If we focus on the special case of q = 2 and each F k having rank 1, the following theorem gives 
the regret bound for MirrorDescent^ 1/r 


Theorem 3.2. In MirrorDescent^ 1/2 , if the parameter a > 0, and the loss matrix F k is rank one 

1/2 A 

and satisfies X, • aF k > —1 for all k, then, for every U E A nxn; 


T—l 


T—l 


i?(C/) d A f ^(F fc ,A fc -l J) < a- J2 


{X k .F k ){X l k /2 .F k ) V Xo (U) 


k =0 


k =0 


1 + A 


1/2 


aF k 


+ 


a 


1/2 i 

If we instead have Xfi • aF k > the above bound can be simplified as 
T—l T—l 

R(U) = ^ (Fk, X k -U)<2a-Y (■ X k • F k ){X l k /2 . F k ) + 

k =0 

We note that Vx 0 (U) < 2 y/n. 


k =0 


1/2 _ cl \ , Vx 0 (U) 


a 


We recommend the interested readers to see the proof of Theorem 3.2 in Appendix E. as it 
provides a straightforward generalization of Theorem 3.1 using regularizers other than entropy. 

Theorem 3.2 is only a special case of the following more general regret bound, which holds for 
arbitrary q > 2, and for F k having arbitrary rank. At a first reading, one can skip Theorem 3.3 
because its sole purpose in this paper is to improve the running time of graph sparsihcation from 
0(mn 3 / 2 ) to 0(mn 1+1 / q ), as well as allowing one to sparsify sums of high rank PSDs. 


Theorem 3.3. In MirrorDescent^ 1 _ 1 , with q > 2 and a > 0, if the loss matrix F k is either 

positive or negative semidefinite and satisfies al/ 2 q F k X k - 2q F —X j f or a n k, then for every 
U E A nxn , 

R(U) = f Y( F ^ X k -U)< 0(qa ) Y( X k • l^fcl) • \\xl / 2 q F k xl /2q \\ spe + . 

k =0 fc=0 a 

We note that Vx 0 (U) < 


(The proof of Theorem 3.3 is deferred to Appendix E.) 

The key idea to prove Theorem 3.3 is to replace the use of the Sherman-Morrison formula in 
the proof of Theorem 3.2 with the Woodbury formula so as to allow F k to be of high rank. It also 
uses the Lieb-Thirring trace inequality to handle arbitrary q > 2.) 


4 Warm-Up: Upper-Sided Linear-Sized Sparsification 

In this section and the next, we present our construction of linear-sized sparisifier in the general 
matrix setting. Its specialization to graph sparsihcation appears in Appendix B. while its efficient 
implementation is discussed in Section 6, To showcase how the regret bounds of Section 3 can be 
useful in the construction of sparsihers, we start by describing a warm-up example in which we are 
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only interested in obtaining a single side of the sparsification guarantee. 

Suppose we are given a decomposition of the identity matrix / = Y1T=i w e L e , where each L e 
satisfies 

0 A L e A / and is of rank 1 and trace 1, i.e. L e = vv t for some v E M n with ||u ||2 = 1. 

The weights w e > 0 may be unknown, though the trace guarantee ensures that w e = n. In 
this section, we are interested in finding some s E A m satisfying YleLi ( ns e) • L e A (1 + s)I, while 
the sparsity of s —that is, \{e E [m] : s e > 0}|— is at most 0(n/e 2 ). We call this the upper¬ 
sided linear-sized spectral sparsification because it only gives an upper bound on the eigenvalues of 
Y^T=i( ns e) ‘ L e and no lower bound. 

Consider the following algorithm that invokes the regret minimization framework in Section 3 
to solve this upper-sided sparsification. We choose 

the £ 1/2 regularizer and a = e/Ay/n for MirrorDescent£ 1/2 . 

At iteration k, set the feedback matrix as F k = —nL ek , where e k minimizes L e • X k over e E [m], 3 

1/2 -i 

Before applying Theorem 3.2. let us first verify that the prerequisite X k • aF\ ^ holds. 

Because Yl e e[m] it • X k = \l • X k = i, by an averaging argument, we must have L ek • X k < L. 
This further implies —anL ek • X^ 2 > —ay/n > — ^ due to the claim below. 

Claim 4.1. For every X E A nX n, we have L e • A" 1 / 2 < (L e • A) 1 / 2 /or every e E [m]. 


Proof. Without loss of generality, one can assume X to be diagonal. Next, since L e = v e vj is of 
rank one, the desired inequality follows from Jensen’s inequality ujA 1,/2 u e < \Jvf Xv e and the fact 
that 11 o e 111 = TrL e <1. dl 


Now, applying Theorem 3.2. we obtain that for every U E A nxn , 

T-l T -1 

J2(~nL ek ,X k -U)<2a- J^( X k • nL e J(A fe 1/2 . nL e J + 


2y/n 


k =0 


After rearranging, and using L 


efc 


k =0 

A - and nL 


a 


6fc 


x'P < 


n we deduced earlier, 


n 

T 


T-l 


k =0 


2a 


T-l 


^L ek ,U)<--'£(X k .nL ek )(X i 


1/2 


' nL, 


efc. 


A;=0 


+ f^2( n ^e h , X k) + -^r 


2a ^ 2 Jn e 8 n 

<-T • 1 • + 1 H——— = - + 1 + — . 

~ T aT 2 eT 

Finally, choosing T = 16n/e 2 and 1/ to be the rank-1 projection over a maximum eigenvector, we 
conclude that A max (T Sl=0 ^efc) < 1 + £• 

This completes the description of our upper-sided linear-sized sparsification algorithm. The full 
sparsification algorithm, in the next section, will essentially consists of playing out this analysis on 
the lower and upper side at the same time. 

We emphasize here that if one chooses the entropy regularizer by using MirrorDescent eX p. and 
chooses e k = e with probability proportional to w e , a similar analysis from the one above recovers 
the sparsification result of Spielman and Srivastava SSIl/ 


3 This choice naturally follows from a saddle-point interpretation of the problem, because it is the subgradient of 
the function f(X) = f rnin s6 A m 'f2™-i( ns eLe) • X at X — X k . We have skipped the explanation of this choice due to 
the space limitation. 


7 








5 Linear-Sized Sparsification 

As before, suppose we are given a decomposition of the identity matrix I = Y1T= 1 w eU e , where each 
L e satisfies 0 F L e F I and is of rank 1 and trace 1. The weights w e > 0 may be unknown and 
satsify w e = n • this section, we are interested in finding scalars s e > 0 satisfying 

I A Y1T= i s e • L e F (1 + 8e + 0(e 2 ))I , (5-1) 

while the sparsity of s —that is, \{e G [m] : s e > 0}|— is at most 0(n/e 2 ). 

Instead of maintaining one sequence X k like in Section 4. we maintain two sequences X k . Y k G 
A nxn • At each iteration k G 0,1,..., T — 1, find an arbitrary e k G [m] such that 

Te fc • X k L ek • 4 fc . 

This is always possible by an averaging argument with weights w e . Next, we choose the £]_j 2 regu- 
larizer and some parameter a < 1/2 (in fact, we will choose a = e later), and updates 

X k+1 = arg min { V Xk (Z) + ( ^ , , z) } 

ZeAn -~ *• H/ 2 / J 


(x k .L ek y / 2 


and 


4fc+i = arg min { Vy fc (Z) + ( 
zeA. L x 


aL 




(Y k *L ek y/ 2 


In other words, we have picked feedback matrices F k = 


— L 


(X k ,L ek ) 1/2 


,z)} . (5.2) 

for the X k sequence and F k = 



obtain that for every Ux £ A nxn , 


£-cv)< 2a■ e'(w • h „,, )(Y 1/2 • ... £ ? J + 


fc =0 


-;\(x fc .L e jV2 


k=0 

T—l 


(X k .L ek ) V2 


(X fe .L e jV2 


a 


2 «-E x fc 1/2# ^ 


Lyo(^y) 


T—l 


fc =0 


a 


<2a-^(A fc .L efc ) 1 /2 + 


Lyo(^x) 


k=0 


a 


Above, the last inequality uses Claim 4.1. If we denote by Mx = 77— m ii an d rearrange 


the inequality above, we get 


(L e ,.-Y fc )l/2 


T—l 


M x • U x < Vx ^ Ux) + (1 + 2a) J](L efe . X fc ) 1/2 . 


a 


(5.3) 


k=0 


Similarly, applying Theorem 3.2 on the Y k sequence, and define My A ^ fc=i 
obtain that for every Uy £ A nxn , 

Vy 0 (Uy) T ' 1 


def 1 T efe 

0 (£e,.«5fc) 1/2 ’ 


we 


My . Uy > - Y ° [ a Y) + (1 - 2a) 5^( L ek . y fc ) 1/2 • 


fc=0 


In the rest of the proof, we will use (5.3) and (5.4) to deduce 

Amax(My) — A m i n (My) < 8e(l + O(e)) A m | n (My) . 


(5.4) 

(5.5) 


4 In fact, the denominator (X k • Le,.) 1 / 2 is defined so as to make sure that F k is the ‘maximally aggressive’ loss 
matrix we can have for MirrorDescentj 


* 1/2 1 




















T—l 


Finally, since My = Ylk= 


o (L e ,.Y k ) i/ 2 


is a matrix that is a summation of at most T = n/e 2 rank-1 


matrices, dividing it by A m j n (My) gives the desired sparsification for (5.1). 
We prove (5.5) in two steps. 

Lowerbounding A m j n (lVTy). Recall that we have Tr(Mx) = Efc=o 


because we have 


(Le.Xfc) 1 ^ 

assumed each L e to be of trace 1. Denoting by a*, = (L e » X^) 1 / 2 , we have that Tr(Mx) = Ylk =o yy 
We apply (5.3) here with Ux = Aj = Xo, and obtain 

T—1 T—1 T— 1 

- V - = -Tr(M Y ) < (1 + 2a) V(L e . X fc )>/ 2 < (1 + 2a) V a fc . 
n z J a k n ' 

k= o k= o fc=o 

Applying Cauchy-Schwarz, we have 


T—l 




> 


1 


T—l 


T—l 


/c=0 


\ a > , v ■> 1 . ^ T ^ 

n(l + 2a) l ^° fcK ^ a? ~ n(l + 2a) 


(5.6) 


If we choose T = ^|, we immediately have 5 

ES(h * n ) 1/2 > ES «>#(!- 0 (a)) . 

Substituting the above lower bound into (5.4), and choosing Uy G A nxn to be the rank-1 
projection matrix over the smallest eigenvector of My, and choosing a = e, we have 

T— 1 

(5.7) 


A min(My) > -^ + (1 - 2a) V(L e . r fe ) 1/2 > (1 - O(e))^ 

a / —' 


k =0 


Upperbounding A max (l\/fy) — A m [ n (lWy). This time, we use our choice of L efc • X k < L ek • Y k 
to combine (5.3) and (5.4) and derive that 


1 


-My • Ux A 


1 


-Mx • Ux < 


1 


-My •Uy + 


2y/n 


1 


+ 


1 


:) ■ 


l + 2a ' — 1 + 2a ‘ — 1 — 2a a 1 + 2a 1 — 2 a J 

Choosing Ux to be the rank-1 matrix projection matrix over the largest eigenvector of My, Uy to 
be that over the smallest eigenvector of My, and recalling that a = e, we have 

Amax(My) < ^L|a min (My) + ^(1 + 0(e)) . 

After rearranging and substituting in the lower bound (5.7). we finish the proof of (5.5) 

Ama x(My) — A m ; n (My) < — -— A m i n (My) -+- (1 + 0{ £ )) A 8e(l + 0{e))\ mm (My) . n 


6 Efficient Implementation for 
Graph Sparsification 

The update rules described in (5.2) imply that X k and Y k are of the form (see Section 31 

X k = (c x ■ I - Tj=o s f ■Ay) and Y k = -s] L ej - c 5 • /) . (6.1) 

Here, c A is the unique (positive) constant that satisfies c^J—X]j=o s f -^ e j ^ 0 and TrX*, = 1, while 
c 5 is the unique (possibly negative) constant that satisfies Tj=o s ] A,- — I >- 0 and TrYj c = 1. 

5 In fact, it suffices to stop our algorithm at the earliest iteration T so that inequality (5.6) is satisfied. Our analysis 
here only represents the most pessimistic scenario; in practice, this early termination implies we can choose less than 
n/e 2 matrices for certain inputs. This is in contrast to [BSS14], as their algorithm uses n/e 2 rank-1 matrices for all 
inputs. 
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The coefficients 


s x and sj are always positive. (It is worth noting that c x is initially y/n at Xq 
and keeps increasing, while (A is initially —yfn and keeps increasing as well.) 

Recall that MirrorDescentf 1/2 requires one to compute c x and c 5 for each iteartion, and this 
can be done via binary search. One way to perform binary search is to first compute A max = 
Amax(X^i=o s f Lej)- Then, one can binary search c x in the range [A max + 1, A max + y/n\ to hnd the 


correct one satisfying Tr(cr Y • I — s f T e ,.) 2 = 1. Similarly, one can binary search (A in the 


sYL ei 


range of [A min - y/n, A min - 1] where A min = A mi n(^=o 

If one performs the binary search to an accuracy that is small enough, this gives an algo¬ 
rithm whose running time is 0(n 3 m/e 2 ), dominated by the computation of X} ; • L e = (c x • I — 

E k— 

7=1 


k-1 


sy L ej ) 2 • L e for each k G [T] and e G [m\. 


Running Time Improvement. For the graph sparsification problem described in Theorem 1, 
we sketch the key ideas needed to improve the running time to 0(mn 1+1 A /e^\ for any even integer 
q > 2. The details can be found in Appendix F and G. In particular, we first describe how to 
achieve a running time of 0(mn 1+1 / 2 /e 5 ). 

Recall that in Section 5. we have constructed Mx and My and proved that A m j n (Mx) and 
A m j n (My) are both at least VL(y/n/e 2 ). In fact, it is not hard to ensure that A max (Mj) and A max (My) 


are at most 0(y/n/e 2 ) as well, 1 Since V- =l 


fc-i „x 


s i L ei 


A olMxi we conclude that the eigenvalues of 


J2j=o s f Lej are all upper bounded by a ■ 0(y/n/e 2 ) = 0(y/n/e). Therefore, throughout the 
algorithm, the encountered choices of c x are always upper bounded by 0(y/n/e). 

For this reason, we only need to compute matrix inversions of the form (cl — A) -1 , with the 
guarantee that c = 0(y/n/e). Since we always have cl — A A / —as otherwise Tr(c/ — A) -2 is 
strictly larger than 1— we can approximate this matrix inverse by 

- <«> 

and it suffices to choose the maximum degree d = 0(y/n/e). This is formally proved in Lemma G.6, 
In other words, when computing Xf~, it suffices to replace the matrix inversion with some matrix 
polynomial of degree d = 0(y/n/e). Similar idea also holds for the Y \ sequence. 

So far, we managed avoiding the computationally expensive matrix inversion. Next, we want to 
further accelerate the procedure of computing (c/ — A)~ 2 • L e for all edges e G [m] simultaneously. 
Recall that L e = v e vj is of rank 1, and one can rewrite 

(cl - A) -2 • L e = vj(cl - A)~ 2 v e = || (cl - A)~ l v e \\l . 

For this reason, as in (SSI 1], one can apply the Johnson-Lindenstrauss dimension reduction JL841: 
there exists random matrix Q with 0(l/e 2 ) rows, satisfying that || (cl — A) _1 v e ||2 ~ || Q(cl — 
A) _1 u e || 2 for for all v e . 

Using this dimension reduction, one can precompute T = Q(cl — A) -1 in time 0(m/e 2 ) x 
0(y/n/e) = 0(m,y/n/e 3 ), with the help from the approximate matrix inversion (6.2). and the nearly- 
linear time Laplacian system solvers (ST04]. After the precomputation, each (cl — A) -2 • L e ~ 
||Tw e || 2 can be computed in 0(l/e 2 ) time, totaling 0(m/e 2 ) per iteration, which is negligible. 

In sum, taking into account that we have T = n/e 2 iterations, the total running time is 
0(mn 1+1 ^ 2 /s 5 ). To turn this 0(mn 1+1 / 2 /e 5 ) into 0(mn l+l / q /e^) for any constant q. we need 
to replace the use of the t\ji regularizer with the l\-\i q regularizer. This requires one to use 
Theorem 3.3 in replacement of Theorem 3.2, 


6 A m ax and A m i n can be computed via power methods, and it suffices to compute them up to an additive error of, say, 
0.1. In Appendix G, we propose an alternative approach to compute c x and c Y , avoiding the use of power methods. 
7 This may require one to stop the algorithm earlier than T = n/e 2 iterations, which is even better! 
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We wish to emphasize here that our analysis in Section 5 needs to be strengthened in order to 
tolerate all the errors incurred from the approximate computations (most notably from Laplacian 
linear solvers, from Johnson-Lindenstrauss, and from (6.2) l. This is only rountinary thanks to the 
optimization motivation behind our argument, and we have done this carefully in Appendix F. 
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Appendix 


Appendix roadmap. 

• In Figure 1. we plot the entropy and the £ 1/2 regularizes of the 3-dimensional scalar case for 
a visual comparison. 

• In Appendix A. we verify the equivalence between FTRL and MirrorDescent for our choices 
of the regularizers. 

• In Appendix B, we provide notations for graphs, and state the reduction from the sparsifying 
graphs to sparsifying sums of rank-1 matrices. 

• In Appendix C. we provide our unweighted sparsification result. 



q> 2 , high rank matrices, and approximate computations. 

• In Appendix G, we provide the details of how to implement linear-sized graph sparsifications 
in almost-quadratic time, thus finishing the running time claim of Theorem 1. 

• In Appendix H, we sketch how to generalize our running time improvement to other problems, 
including sparsifying sums of rank-1 PSD matrices (i.e., Theorem F.5), as well as subgraph 
sparsifications. 



(a) The entropy regularize!’ (b) The I 1/2 regularizer 


Figure 1: Two regularizers in n = 3. The first two axes represent x\, X 2 so X 3 = 1 — x\ — X 2 - 
third axes represent w(x). 


The 
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A Partial Equivalence Between FTRL and Mirror Descent 

In this section, we show the equivalence between mirror descent and follow-the-regularized-leader 
for our choices of the regularizers. In fact, this equivalence holds more generally for all regularizes 
w(-) that are convex function of Legendre type with domain Q (see for instance [BMDG05. Roc96l). 

Letting Ai € M n be any symmetric matrix for each iteration i, the follow-the-regularized-leader 
method can be described as 

fc-i 

Vfc = 0,1,..., T — 1, = argmin jw(Z) + y~](A,-Z)} • (A.l) 

Z^A n x n z=0 

The mirror descent method (with starting point Xq = ^J) can be described as 

Vfc = 0,1,..., T — 1, X k = argmin|y^ fc i (Z) + (A fe _i,Z)| , (A.2) 

Z^A n xn 

where as before, Vx(Y) == w(Y) — ( Vw(X),Y — X) — w(X) is the Bregman divergence of w(-). 

Recall that when w(X) = X • (log X — I ) is the entropy regularizer, then Xw(X) = log X 
and therefore (Vw) _1 (A) = e A . When w(X) = — Aj-TrA 1-1 / 9 is the regularizer, then 

Vro(A) = X~ l / q and therefore (Vic) -1 (A) = A~ q . The rest of the proof holds for both these two 
types of regularizers. 

To compute the minimizer Xk for (A.l), one can take the derivative and demand that Xw(X k ) + 
Ei=o Aj — Cfc • / = 0. Here, the extra term — Ck ■ I comes from the Lagrange multipliers of the 
linear constraint Tr (Z) = I • Z = 1. (We do not have Lagrange multipliers for the other constraint 
Z y 0 because our gradient Vw(Z) is a barrier function and tends to infinite as any eigenvalue of Z 
tends to zero.) It is now easy to see that Ck is the unique constant that ensures Ei=o A — Ckl r< 0 
(because Xw(Xk) >z 0) and that TrA^ = Tr((Vrc) _1 (cfc/ — Yli =o Aj)) = 1- 

To compute the minimizer Xk for (A.2), one can take the derivative and demand that Xw(Xk) — 
Vw(Xk-i) + Ai — dk ■ I = VV^ (Afc) + Ai — dk • I = 0. Here, the extra term — c4 • / again comes 
from the Lagrange multipliers of the linear constraint Tr(Z) = / • Z = 1. It is now easy to see that 
dk is the unique constant that ensures —Xw(Xk-i) + Aj — dk • I ^0 (because Xw(Xk) 0) and 
that TrX k = Tr((Vn;) _1 (Vu;(A fc _i) + d k I - A k - 1 )) = 1. 

To show the equivalence between (A.l) and (A.2). we perform a simple induction. Suppose that 
X k -i = X k -i, and we wish to prove X k = X k . 

In this case, we have 

X k = (Vw) _1 (Vu;(A fc -1) + d k I - A k -i) = (Vw) -1 (Vrc(A fc _i) + d k I - A k - 1) 

k -1 

= (Vw)' 1 fc fc _iJ + d k I - ^ Ai) , and 

i =o 

fc-i 

X k = (Vw)- 1 (c k I-J2 A i) ■ 

i=0 

Finally, since d k is the unique constant that ensures c k _il+d k l— Eto 1 A* >z 0 and Tr ((Xw)^ 1 (c k _il+ 
d k I — Eto 1 A^) = 1, while c k is the unique constant that ensures c k I — Eto 1 A — 0 anc ^ 
Tr((Vrc)^ 1 (cfc/ — Eto 1 Ai)) = 1) it is obvious to see that c k = c k -1 + d k and therefore X k = X k . 

B Graph Notations 

Let G = (V, E, w) be a undirected weighted graph with n vertices and m edges, and each w e > 0 is 
the weight of edge e. Without loss of generality, we study only connected graphs throughout this 
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paper. For every edge e = (a, b ) € E, we orient it arbitrarily and denote by y e = f e a - ej G 
characteristic (column) vector of edge e. 


the 


T . T aei 'T' 

Let L e = W e XeXe G 


be the graph Laplacian of edge e, or the edge Laplacian. Let 


B E M mxn be the incidence matrix where its row corresponding to edge e is the characteristic 
(row) vector xl ■ Define W = diag {w e } e& E to be the diagonal matrix of edge weights. The 
Laplacian with respect to graph G is Lq = B T WB E M nxn . It is clear from the definition that 
Lq >z 0 is PSD and Lq = Zee-E-U Notice that ker(Lc) = ker(VF 1//2 L>) = span(l), and therefore 
x T L qx = 0 if and only if x is a constant vector. 

Since Lq is symmetric, one can diagonalize it and write Lq = Y^i=i ^i y i y T > where A,’s are the 
positive eigenvalues of Lq and vfs are the corresponding set of orthogonal eigenvectors. The Moore- 
Penrose pseudoinverse of Lq is denoted by Lq = f 'fi Xy i v 7- Dor notational convenience, we 

will stick to Lq 1 to denote this pseudoinverse, and often use Lq 2 to denote (L^.) 2 , and Lq ^ 2 to 
denote (Lg) 1//2 , and so on. We remark here that LqLq 1 = L^Lq = Yhi y i y J = I\ m (L G )- Here, 
Jim (l g ) is the identity matrix on the image space of Lq , which is just the space spanned by all the 
vectors orthogonal to 1. For notational convenience, we will often abbreviate I\ m (L G ) as L 

Throughout this paper, whenever related to graph sparsifications, we denote by 


t “LI r~P 2 r t 

lj e - IjQ ljpljQ 


and 


C' def 

L/ e — 


T ^/ 2 t r -1 / 2 

Eg lj e lj G 

Lq • L e 


L P 


Lq • L e 


Above, L e is the normalized edge Laplacian, and L e is the normalized edge Laplacian scaled by the 
effective resistance. ( Lq 1 • L e is the “effective resistance” of the edge e, see for instance (SSI 1]). 
Both of them have rank 1, and it satisfies Tr(L e ) < 1 and L e A I, while Tr(L e ) = 1 and L e A /. 

It is easy to check from the above definition that Yl e ^ e = ^im (l g )- addition, letting w e = 
Lq 1 • L e be the effective resistence of edge e, then w e L e = I\ m (L a ) as we h- Notice that Yl e w e = 
Trlj m (L G ) = n — 1, the dimension of I\ m (L a ) ( see (SS11 ). 


From Graph Sparsification to Rank-1 Decomposition Sparsification. As originally shown 
in [BSS141, one can easily translate the problem of graph spectral sparsification (see Theorem l 'l into 
that of sparsifying sums of rank-1 matrices (see Theorem 21. Indeed, because I\ m (L a ) = (C e e[m] 
is a summation of rank-1 matrices, if one can find scalars s e > 0 (as per Theorem 2 i that satisfies 
Ij m (L G ) A ^j) eg r i s e L e A (1 + e)/j m (L G ), this immediately implies, by the definition of L e , that 
f Z^eeH SeLe — ^ + £ ^ l g- 


C Weak Unweighted Sparsifier 

In this section, we consider the weak unweighted spectral sparsification problem very recently studied 
by Anderson, Gu and Melgaard AGM14]: for any value k E [1 ,m/n], find a K-spectral sparsifier 
of G containing 0(m/n) distinct edges from E, without reweighting. We show that our regret 
minimization framework allows us to design a simple and almost-quadratic-time algorithm for this 
problem, improving from the quartic time complexity of [AGM141. 

Formally, given any weighted undirected graph G = ( V , E, w) with n vertices and m edges, and 
any value k E [1, m/n], the task it to find a subset Eq C E containing 0{m/ k) distinct edges such 
that 

-Lg a V' L e -< Lq . 
eSEo 

This is an unweighted sparsification problem because one is not allowed to reweight the edges in 
Eq, in contrast to Theorem 1: and we call it a weak sparsifier because k is usually large. 
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Similar to Appendix B, one can easily reduce this graph sparsification problem to sparsifying 
sums of rank-1 matrices. Given m rank-1 PSD matrices L\,... ,L m E M nxn that satisfies I = 
X^ eg [ m ] D e , and given some k E [l.m/n], find a subset Eq C [m] with 0(m/n ) distinct elements 
satisfying £ eeEo L e F \l. 

(In this section, one should feel free to coincide this L e with the ‘normalized edge Laplacian’ 
introduced in Section B: but L e needs not coincide with any graph Laplacian in general.) 

We solve this weak unweighted sparsification problem via the following reduction to regret 
minimization. 

If k < 9, we output E 0 = E and are done. Otherwise, we choose the 1^/2 regularizer and 
parameter a = 4 y/nn for MirrorDescent^ 1/2 . At each iteration k = 0,1,...,T — 1, we define 


= e to be the index e E [m\ that maximizes the quantity 


X\~%L e 

1 +Xl /2 ,al t 


among all edges not chosen 


before —i.e., all edges in E \ {eo, e\, ..., e^-i}. Next, we feed Ff.. = L ek as the feedback matrix to 
MirrorDescent£ 1/2 , and compute Xf~+ 1 of the next iteration. 


Let us now state a simple property for the selected matrix L ek using an averaging argument: 


Claim C.l. For each k = 0,1, ..., T — 1, we either have -^e, F 


Xu*L e 


or 


i+x, 


1 / 2 . 


'OtL e 


><k- 


Proof. Let us recall that by the definition of MirrorDescent^ 1/2 , we have 

k —1 _2 

Xk — ^a ^ ) L e j , 

1=0 

where c*, > 0 is the unique constant that makes a ]C/=o T e , F and TrA*. = 1. Note that if 

c k /a > ^ then we already have ^e, >- ^I F \l- Therefore, we can assume Ck/a < ^ for the 

rest of the proof. 

One one hand, we have 

k -1 


^ ^ Xk • L e — Xk 

e£{e 0 ,...,e k -i} 


i-Ei- 

1=0 


= X k 


xr — 1/2 

Cb A . 

I - —I —*— 


c k . Ti'A 


1/2 


= (!--)- 


a 


1 


> 1- 

K 


a 


n 5 

> x 
a 6 


(C.l) 


a a 

where the first inequality is due to TrA^ < yfn and the second inequality is due to our choice of 
a = 4 y/nK and the fact that k > 9. 

On the other hand, we have 

r 1/2 


E 


e^{e 0 ,...,e fc _i} 


6^( 1 + ^ /2 


<>/. ) < 1 + " y 

6 6m 


la t i/2 
o ora 


e0{eo,...,e fc _!} 


1 aWn 1 4n/v 5 

I < - + —— = - -|-< - 

6 6m 6 6m 6 


(C.2) 


where the second inequality is because J 2 eg{e 0 e k -i} L e E Seefml = ^ ie UErd inequality is 

because TrA fc ' < yEi, and the fourth inequality is because k < m/n. 

Combining (C.l) and (C.2), we conclude that there exists at least some index e E [m] \ 
{eo, • ■ ■, efc-i} satisfying that A k • L e > ^ (l + X^J 2 • aLf) , finishing the proof of the claim. EH 
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Now we are ready to apply Theorem 3.2, the regret bound, with our choice of Fk = L ek : 

It , 1 - It , 1 „ Tv(x,.r,, x 1/2 '‘ 

^ U £ A nX nj 


E (£... U) > E (A. w> - a hAATiAT _ ^ 


k=0 


k =0 
T—1 


i + x: 


aL 




a 


k =o 
T—1 

£ 


E<h,w*>(i- 


xl /2 *<*L ek 


aL, 


1 + x\‘ 2 . 

2 yjn 


2y/n 




a 


_ 1/2 ~ • ( C - 3 ) 
^i+x; /2 .Qi efe « 

We will now choose T = 9m/n. (Notice that T < m because k > 9.) There are two possibilities 
according to Claim C.l. 

In the first case, we have YljZ o ^e, A \l for some k = 0, 1,..., T — 1 and we are done: that is, 
defining Eq = {eo, e\,..., efc_i}, we have that |So| < T = 0(m/n) and I T Yl e eE 0 ^ e — k^' 

> tA- for all A: = 0,1,..., T — 1. Substituting this 


In the second case, we have 


Xi.»Le 


r l/2. 


aL e 


l+X k ~^e k 

into (C.3), and choosing U to be the rank 1 matrix corresponding to the smallest eigenvalue of 


L ek , we conclude that 




k =0 


k =0 


1 

2k 


1 

K 


Therefore, defining Eq = {eo,ei,... ,er~ i}, we also have |2£o| = T = 0(m/n ) and / T YleeE 0 A 
In sum, 

K. ' 


Theorem C.2. Given a decomposition I = X]ee[m] °/ ran k-l PSD matrices, and given some 
k £ [l,m/n], the above algorithm finds a subset Eq C [m] with O(^) distinct elements satisfying 


L P X 


I — Xle-Eo ■ L 'e — 

We remark here that for graph sparsification, the above algorithm can be implemented to run 
in time 0(m 3//2 n), and can be improved to 0(m 1+1 /' ? n) for any even integer constant g > 2 if the 
(■i-i/q regularizer is used instead of ^ 1 / 2 - We ignore the implementation details in this version of 
the paper because it is very similar to the details discussed in Section 6. 


D Proof of Lemma 2.1 

We state some classical properties for Bregman divergence, which are classical and can be found in 
for instance CL061. 

Lemma 2.1, The following properties hold for Bregman divergence. 

• Non-negativity: Vx(Y) > 0 for all X,Y > 0. 

• The three-point equality”: (Vw(X) - Vw(Y),X - U) = V X (U) - V Y {U) + V Y {X). 

• Given X Y 0 and X = arg min zeA„ yn Y^ifZ) as the Bregman projection, we have the “■gen¬ 
eralized Pythagorean theorem” for all U £ A; V^(U) > V x {U ) + V^-(X) > Vx(U). 

Proof. The non-negativity follows by definition from the convexity of w(X). For every UFO, the 
“three-point equality” follows from the following inequality. 

(Vw{Y) - Vw(Y),Y -U) = ( w(U ) - w(Y) - (Vw(Y), U - Y)) - ( w{U ) - w(Y ) - ( w(Y ), U - y))) 

- (w(Y) - w(Y) - (Vw{Y),Y - Y)) 

= V Y (U) - V Y (U) - Vy(Y) . 
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For the generalized Pythagorean theorem, we only need to prove V^(U) > Vx(U) + V^(X) because 
the second inequality follows from the non-negativity of V^(X). To provide the simplest proof, we 
only focus on the special case when w(X) = — ^(Lj-TrX 1-1 / 9 . (The proof for the entropy regularizer 
is similar, while the proof for the most general Legendre function case is more involved.) 

By definition, 


VxiU) + Vx(X) = X~ l / q *U+ -^TrX 1 " 1 /? _ 

+ X~ 1/q • X + —-—TrX 1-1 / 9 -—TrX 1 " 1 / 9 

q — 1 q — 1 

V?(U) = X~ l / q • U + TrX 1 " 1 /?-—Trt / 1 " 1 / 9 . 

’ q- 1 q- 1 


Therefore, 

V~(17) - (F x (f7) + V~(X)) = X" 1 / 9 • U - X~ l / q »U- X~ l / q • X + TrX 1 " 1 / 9 

= ( X~ l / q - X" 1 / 9 ) • (U - X) . 


Since V^(U) is a convex function and X = argmku gA Vy(z), for any U £ A nxn we must have 
(Vl^(I), U - X) > 0 <=► (-X~ 1/q + X~ l / q , U-X)> 0 . 


This concludes the proof of the lemma. 


□ 


E Missing Proofs in Section 3 

Theorem 3.1. In MirrorDescent exp if the parameter a > 0 satisfies aF ' k —I for all iterations 

k = 0,1,..., T — 1, then, for every U £ A nxn , 

T—l T —1 

R(U) = ^(F fc ,X fc -U)<aJ2 {*k • \F k \) ■ HTfcllspe + . 

fc=0 k= o a 

We note that Vx 0 (U) < logn. 


Proof. We prove the theorem by using a two-step description of the mirror descent. For every 
k > 0, define X k+ \ = argmin z> _ 0 { Vx k ( X ) + a(F k ,Z)}, where the minimization is over all Z F 0, 
rather than Z £ A nxn . This minimize!' X k+ \ certainly exists (and equals to exp 108 ^ -0 ^), and it 
is not hard to verify that X k+ \ = arg min zeA nxn {^x k+1 (Z)}■ In other words, one can describe the 

update X k —> X k+ \ by adding an intermediate stage X k — > X k+ \ -» X k+ \. We also assume that 
initially we have Xq '= Xq. 

Noticing that the definition of X k+ 1 implies that Vfi k (%i )+otF k = 0, which by the definition 
of Vx(Y) is equivalent to Xw(X k ) — Xw(X k+ i) = aF k . Therefore, 

(aF kl X k -U) = (Vw(X k ) - Xw(X k+1 ),X k -U) = V Xk (U) - V~ fc+i (l7) + V^ +i (X k ) 

^ v xX)~ V xX l, ) + XJ Xt '> ■ (E- 1 ) 

Above, the second equality is due to the three-point equality and the only inequality is due to the 
generalized Pythagorean theorem of Bregman divergence (see Lemma 2.1). Now, 

V~ fc+ i(X fe ) = x k • (log X k - log X k+ 1 ) + TrX fc+1 - TrX fc 

= X k . aF k + Tr(e logXfc - Q ^) - TrX fc < X k • aF k + X k . e~ aFk - TrX k 
< X k • aF k + X k • (/ — aF k + a 2 F k ) - TrX fe = or ■ X k • F k < a 2 ■ (X k • |FA;|)||Ffe|| S pe 
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Above, ® is due to the Golden-Thompson inequality. © follows because e~ aA A I — a A + a 2 A 2 , 
which can be proved after transforming into its eigenbasis, and then using the fact that e~ a < 
1 — a + a 2 for all a > —1. © follows because F k A H-Ffcllspe • \F k \. 

Finally, substituting the above upper bound into (E.l) and telescoping it for k = 0,..., T — 1, 
we obtain 

ZzZ Vy (u) - v~ (u) Za! 

J2(Fk,X k -U) < ^-+ a^(X fc .|F fc |)||F fc || spe . 

k =0 k =0 

The desired result of this theorem now follows from the above inequality and the simple upper 
bound Vx Q (U) = Vx, 0 (U) < log n and the nonnegativity (U) >0. D 


Theorem 3.2, In MirrorDescent^ 1/2 , if the parameter a > 0, and the loss matrix F k is rank one 

1/2 A 

and satisfies Xfi • aF k > — 1 for all k, then, for every U £ A nxn; 


T—l 


T—l 


R(U ) ft' X (ft, X k - U) < a . X {Xt • Fk) ^ - Ft) + AA1 
feO 1 + xl ,2 .aF t 

1/2 i 

If we instead have XJ • aF k > — 5 , the above bound can be simplified as 


T-1 T—l 

R(U) = ^2(F k , X k -U)<2a-J2 (*k • E fc )(X fe 1/2 . F k ) + 

k =0 fc=0 


Vx 0 (U) 

a 


We note that Vx 0 (U) < 2 y/n. 


Proof. We prove the theorem by using a two-step description of the mirror descent. For every 
k > 0, define X k+ \ = argmin z ^ Q {Vx k (Z) + a(F k ,Z)}, where the minimization is over all Z F 0, 
rather than Z £ A nxn . We claim that this minimizer X k+ \ exists and is strictly positive definite, 
because one can choose Z = X k+ \ = (X k 1 + aF k )~ 2 ^ 0 to make the gradient zero: 

Wx fe {X k+1 ) + aF k = Vw(X k+1 ) - Vw(X k ) + aF k = -X'/f + X ~ 1/2 + aF k = 0 . (E.2) 

This uses our assumption X ^ 2 • aF k > — 1 which is equivalent to aF k > - X k 1 ^ 2 , 8 so as to ensure 

that X k+ \ is well defined. 

Next, it is easy to verify that X k+ \ = argmin zeA nXn {Vx k Ei other words, one can 

describe the update X k —» X k+ \ by adding an intermediate stage X k -» X k+ \ —> X k+ \. We assume 

-—' def 

for notational simplicity that Xo — Xq. 

Using (E.2). we easily obtain that 

(aF k ,X t -U) = (Vw(X k ) - Vw(X t+ 1 ),X t - U) = V Xt (U) - V^U)+ V^(X t ) 

< Vjl t A) - V x ttt (V) + V Xm (X t ) . (E.3) 

Above, the second equality is due to the three-point equality and the only inequality is due to the 
generalized Pythagorean theorem of Bregman divergence (see Lemma 2.1). 

We now exactly compute 1 (Xfc) in two cases. 

• If aF k = —uu T is negative semidefinite, using the Sherman-Morrison formula, 

v l/2 T V I/ 2 

T XS=U(vr 1/2 - --T 1 )=) ■ 

8 This is because, if Fk = —uu T , then Xfi 2 • (— auu T ) > —1 is equivalent to au T X\f 2 u < 1, which is further 
equivalent to ctYrX^^uvZxfi A < 1. However, since X^^uvFXfj^ is a rank-1 matrix, this is finally equivalent to 
auu T -< Xf 1/ '~. 
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Therefore. 


x x t J x k) = x kF * x “ + - 2TrX l /2 = ( X k 1/2 - ™ T ) • x k + - 2TM :\ n 

= • x k + - Trx; /2 ) = -u T X k u + t 2 u 

_ u T X k u • u r X fc 1/2 u _ 2 {X k *F k )(xl /2 »F k ) 

\-u t x]! 2 u a 1 + xl /2 »aF k 


• If aTfc = uu T is positive semidefinite, using the Sherman-Morrison formula, 

v l/2 Tv V 2 

Ti-vii-i = T.-((V t - 1/2 + uu T )-') = Tr(x; /2 - . 

Therefore, 

^Y fe+1 ( x k) = Xkli • X k + ^Xk+i ~ 2 TrXl /2 = (X ~ 1/2 + uu T ) . X fc + TrX^ - 2TrX fc 1/2 

= uu T • X k + (TrX^ 2 - Tr^ /2 ) = u T X k u + 

+ 1 + u T xl /2 u 

_ • u T xl /2 u _ 2 (X k * F k ){xj /2 • F k ) 

1 + u T Xl /2 u a l + xl / 2 *aF k 


Finally, substituting the above computation of V^ k+i {X k ) into (E.3) and telescoping it for k = 
0,..., T — 1, we obtain 


T—1 


Y,(Fk,X k -U)< 


x x„(U ) - V Xt (U) 


a 


T OL 


T—l 

E 

fc =0 


(X k . F i )(x ‘ /2 . F t ) 

1 + E /2 


fc=o " k =o 1 f ^ k * a-Pfc 

The desired result of this theorem now follows from the above inequality and the simple upper 
bound Vv ( U ) = Vx 0 (U) < 2 y/n and the nonnegativity VT (U) >0. EH 


The next theorem generalizes Theorem 3.2 to high rank loss matrices and fi-i/g-regularizers 
with q > 2. The key idea is to replace the use of the Sherman-Morrison formula in the proof of 
Theorem 3.2 with the Woodbury formula so as to allow F k to be of high rank. It also uses the 
Lieb-Thirring trace inequality to handle arbitrary q > 2. 

Theorem 3.3, In MirrorDescent^ q with q > 2 and a > 0, if the loss matrix F k is either 

positive or negative semidefinite and satisfies aX^ 2q F k X^J 2q E — for all k, then, 

T-l T_1 V (TJ) 

VU E A nxn , R(U ) = - U) < O(qa) £(X k . \F k \) • \\X l k /2q F k xl /2q \\ spe + . 

fc =0 fc =0 “ 

We note that Vx 0 {U) < ^bj-n 1 / 9 . 


Proof. We prove the theorem by using a two-step description of the mirror descent. For every 
k > 0, define X k+ \ = &rgmm Z yQ{Vx k (Z) + a(F k , Z)}, where the minimization is over all Z E 0, 
rather than Z E A nxn . We claim that this minimize!' X k +\ exists and is strictly positive definite, 
because one can choose Z = X k+ \ = (X k l ^ q + aF k )~ q >- 0 to make the gradient zero: 

W Xh (X k+ i) + aF k = Vw(X k+1 ) - Vw(X k ) + aF k = -X^ q + Xfi l/q + aF k = 0 . (E.4) 
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This uses our assumption aX^ 2 q F\-X^j 2q P — ^1 which certainly implies aF^j P ~\X k 1 ^ q , so as 
to ensure that X k+ i is well defined. 

Next, it is easy to verify that X k+ ± = argmin ZgA {V^ fc+i (Z)}. In other words, one can describe 
the update X k —> X k+ i by adding an intermediate stage X k —>• X k+ \ —> X k+ \. We assume for 
notational simplicity that Xq == Xq. 

Using (E.4). we easily obtain that 

{aF k ,X k -U) = (Vw(X k ) - Vw(X k+ 1 ),X k - U) = V Xk {U) - V~ fc+i (l7) + V* fc+1 M 

^ V xS U )- V x„JU) + V Xk JX t ) . (E.5) 

Above, the second equality is due to the three-point equality and the only inequality is due to the 
generalized Pythagorean theorem of Bregman divergence (see Lemma 2.1). 

We now upper bound yx k ) in two cases: the case when o.F k = —PP T P 0 and the case 

when aF k = PP T P 0. In both cases, we denote by (3 = f a\\Xy 2 q F k Xy 2q \\ spe = \\X^J 2 q PP T X ^ 2q \\ spe . 
Notice that this implies 9 

X l k / 2 q PP T xl /2q P 131 and P T X X J q P P /3I . (E.6) 


If aF k = —PP T , we have X k l ^ q >- PP T and f3 < ^ by our assumption, so using the 
Sherman-Morrison-Woodbury formula, 


TrA 


1 — 1 /Q _ 
k +1 


= Tr((X~ 1/q - PP T )~ 1 ) q 1 = Tr(xl /q + xl /q P(l - P T x]j q P ) 1 P T x]j q ^ 


9-1 


< Tr( X l J q + 


i/,. xpppXil y- 1 

1-/3 > 


where the last inequality follows because (I — P T Xy q P) 1 A j-U/ owing to (E.6). as well as 


APB ==> Tr A n < Tr B n . We continue and write 

xl /q PP T xl ,q \q -1 


i< Tvfxp+ 


1-/3 


= Ti'(Af 2 »(/ + 


\-l/2? ppT yl/2? i 

X k X k \ v l/2q\1 1 

1^0 > X ‘ 




< 


tv (_ y (»- i )/ 2 S(/ + '.Yfa-w*. 

ppT ^ 


= Tr(x 1 l - 1/ ''(l + 


1-/3 


W*) , 


-ppj —-— ^ (which uses (E.6) again), we 


where the inequality uses the Lieb-Thirring trace inequality (which relies on the fact that 

def V 1/2 9ppTv- 1/2 « a 

q — 1 > 1). Finally, denoting by D = — --— v 

have 

(I + Dy- 1 P I + (q - 1 )D + 0(q 2 f3) ■ D . 

This matrix inequality can be proved by first turning into its eigenbasis, and then verifying 
that (l+x) 9_1 < l + (g— l)x+0(q 2 (3)x for all x G [0, pp\ (which uses the fact that f3 < 1/2 q). 


9 The second inequality is because P T X^ q P = (P T Xy 2q )(P T X k q2q ) T and has the same largest eigenvalue as 
(. P T Xl /2q ) T (P T xl /2q ) = X k /2q PP T X k /2q . 
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Using this inequality, we conclude that 


TrX 


i—i /q 


< Tr (X 


k+l — 

< Tr 


, Yl/29pnTyl/29 

1-1/9/r I A k A k \9-l 


' (/+ 1-/3 

(x;- i/g (/+(( g -i)+o^ 




1/2? p pT t yl/2? 


X,‘ /y PP'X 


l-p 


= TrX:}- 1/q + (q- 1)(1 + 0(q/3))X k • PP T . 


Therefore, 


U + , ™ = U+T • W + 


1-1/9 


TiX}~ l/q 
q- 1 fc 


= (^: 1/9 


- pp' 


x k + —TiXl~ 1/q 


Q 


q~l 


L fc+1 


-7 Tr ^fc 

q — 1 


1-1/9 


= -PP J . X k + 


q -i 


(TrX ( 


1-1/9 


H-i 


) 


= 0(q/3) • PP T • X k = 0(qa 2 )(X k . |P fc | 

If aF k = PP T , using the Sherman-Morrison-Woodbury formula, 


yl/29 jp v 
X k *kX k 


1/291 


|spe 


TrX 


1-1/9 _ 


fc+1 = Tr((X" 1/9 + PP T )~ 1 ) q 1 = Tr(X L k /q - Xy q P(I + P T X L J q P)" 1 P T X- 


jT \-IN 9—1 


-1/9 


-1/9 j 


T ylA p’l-lpT 

k 


< TrlX k — 


1/9 ^PP^V — 1 


1 + /3 


where the last inequality follows because (I + P T X^ //y P) 1 P owing to (E.6). as well as 
APB =>■ Tr^4 n < TrP n . We continue and write 


TrX 


1-1/9 


fc+1 — 


< Tr[ X,y 9 - 


xl /q PP T xV q \q-^ 


1 + /3 


= n(Af 2 »(/- 


j^-i/2 qppT ^P/^q 


1 + /3 




1/29^ 


9-1 


„ -v-1/2? ppT y-l /29 

<- fjij. j y(9—1//2? / j fc _ fc 1 9—1 ^(q~ 1//2? 

- k \ 1 + /3 > k 

, y-1/2? p pT y-1/2? 

= Tr ( X^ _1//<? (l — — 


1 + 0 


) 9 " 1 ) > 


def Xt /2q PP T Xl /2q 


where the inequality again uses the Lieb-Thirring trace inequality. Denoting by D = ——— P 
(which uses (E.6) again), we see that 

(.I -D) q ~ l PI -(q- 1 )D + 0(q 2 (3) • D . 

This matrix inequality can be proved by first turning into its eigenbasis, and then verifying 
that (1 — x)^ 1 < l — {q~l)x + 0{q 2 j3)x for all x £ [0, j^\ (which uses the fact that /3 < 1/2 q). 


This concludes that 
TrX 


y-l/2? p pT y-l/2? 

i—1/9 j^-i—V ^ (i _ ^ ^ y?—1 


k+l — 

< Tr 


) £ 


1 + /3 

(x fc 1 - 1/ ' ? (/-( 9 -l)(l-0(g/3)) 


j^-i/2? ppT jP/^q 


1 + /3 


TrX, 1 ' 179 - (? - 1)(1 - 0(q/3))X k • PP 7 
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Therefore. 


1 


^ TO = YP • W + 


1-1/9 


-^— Trx}~ 1/q 
q- 1 k 


= ( X~ 1/q + PP T ) . X fc + 


= PP 1 • X fc + 


1 


y(Tr* 


g-i 

1-1/9 
k -\-1 


TVX 


1-1/9 

fc+1 


g 


-7 Tr ^fc 

q — 1 


1-1/9 




’'1/2(7 77> y'1/2Q' I 


= 0( 9 /3) • PP T • X fc = 0(qa 2 )(X k • |F fc |) • || X^F k X^ k 

Finally, substituting the above upper bound on (W) into (E.5) and telescoping it for k = 

0, ..., T — 1, we obtain 


fc +1 


T—l 


J2(Fk,Xk~U) < 


Vxp) ~ v xJ u ) 


T—l 


Xjp ' 


a 


+ 0(qa) ^(X k . \F k \) • || x)j 2q F k x] ,2q ' 


llspe 

k =0 k =0 

The desired result of this theorem now follows from the above inequality and the simple upper 
bound Vjf ( U ) = Vx 0 (U) < -^n l / q and the nonnegativity (U) >0. D 


F Robust Linear-Sized Sparsification 

In this section, we deduce the more generalized version of the same result presented in Section 5, 

with the following major differences. 

• Regularizer. In this section, we allow the general (-i-i/q regularize!' to be used, for any even 
integer q > 2, rather than just the £ 1/2 regularizer. (The assumption on q being even integer 
rather than all reals no less than 2 is only for the sake of proof convenience.) 

• High rank. In this section, we allow L e to be possibly of high rank, rather than just rank 1. 

• Approximate computations. In this section, we allow many computations to be approximate 
rather than exact. This will enable the algorithm to be more efficiently implemented in the 
next section l Appendix G l. In particular, we allow the following quantities to be approximately 
computed. 

— We only need TrL e to be in [1 — gq, 1] rather than exactly one. 

— We only need TiX k and Tr Y k to be in [1,1 + £ 1 ] rather than exactly one. 

— We only need L e • X k and L e • Y k to be computed only up to a (1 + £ 1 ) multiplicative error. 

We will assume throughout this paper that £1 < 1/2. 


F.l The Problem 

Suppose we are given a decomposition of the identity matrix I = w eL e , where each L e satisfies 

® 0 k L e ^ I, © TrL e G [1 — £ 1 ,1], and © L e may be of high rank. The weights w e > 0 may be 
unknown. 

In this section, we are interested in using the ^ 1 - 1/5 regularizer for MirrorDescent^ 1 _ 1/c| in order 
to find scalars s e > 0 satisfying 

I Y ^2 s e - L e < ^1 + ^ 1 • £ + 0{e 1 + qe 2 + £\£y/q)^I , (F-l) 

while the sparsity of s —that is, |{e G [m] : s e > 0} |— is at most n /£ 2 . We will not worry about 
the running time in this section, and defer all the implementation details to Appendix G. 
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Throughout this section, we pick w(X ) to be the regularizer and Vx(Y) to be its induced 

Bregman divergence. 

F.2 Our Algorithm 

Maintain two sequences Afc, Afc A 0 satisfying Tr Afc, Tr Afc E [1,1 + £i]. At the very beginning we 
choose An = -I and Yn = -I as before. 

u n u n 

At each iteration k = 0,1,..., T — 1, find an arbitrary efc such that 

Dot(L ek , A fc ) < (l + ei) 2 Dot (L ek ,Y k ) , 

where Dot(L e , A) is some algorithm 10 that approximately computes L e • A and satisfies 

L e • A < Dot(L e , A) < (1 + £i) • L e • A . 

We can always do so because after averaging, 

w e Dot(L e , Afc) < (1 + £i) S ^ j {w e L e ) • Afc = (1 + £i)TrAfc 

e e 

< (1 + £i) 2 TrYfc = (1 + £i) 2 ^(w e L e ) • Yfc < (1 + £i) 2 ^ rc e Dot(L e , Afc) . 

e e 

At each iteration = 0,1,..., T — 1, we perform updates by finding 11 arbitrary 5x,$y > 0 
satisfying 

aL P 


r - 1/5 + 


€-k 


where 


Afc +1 = f fA “ 1/9 + 


Dot (L ek ,Y k )Vi 
—aL, 


- 5 y I h 0 and TrA fc+1 , TrA fc+1 E [1,1 + £i] , 




Dot(L efe ,Afc) 1 /'? 


+ 5xl) Q and Afc +1 d = f (a " 1/9 + 


aL, 




Dot^e^Afc) 1 /-? 


-Syl 


-<? 


Above, a > 0 is some parameter that will be specified at the end of this section. Note that this 
corresponds to performing updates 

aLp 


“ Afc + i e- arg min { V Xk {Z) + ( 




“ Afc +1 <— arg min ( Vy k (Z) + ( 
Z& A--- L \ 


Dot{L ek ,X k y/<i 
/ c^L ek 


■ z )} ” 
Z )} - 


and 


Dot(L efc ,Afc) 1 A 

however, we have not required TrAfc + i = TrAfc+i to be precisely equal to 1. 

For analysis purpose only, we also define Afc+i and Afc + i to be similar updates but without 5x 
or dy- 


Afc +1 = f (X~ l/q + 


—aL, 


Sk 


Dot(L efe , Afc) 1 /'? 


and Afc+i = (V fc 1/q + 


aL 


6 fc 


Dot (L e ,, Afc ) 1 A 


-q 


We assume also Aq = Aq. 


Note that Tfc+i is always well defined. Claim F.l below shows that as long as a < 1, it always 


satisfies A. + —— C"’*' , A 0, so Afc+i is also well defined. 

Dot(Le k ,Xk) 'Q 

Claim F.l. For every e E [m], we have X, l ^ q A — L * , A 

(Le+Xu) ' Q 


—aL , 


by 


aL e 


Dot (L e ,X k y/i 


— = PP T , we have 0 A P T X^ q P P al. 


Dot (L e ,X k y/i 


. In addition, denoting 


10 The implementation of this algorithm will be described in Appendix G. 

n The existence of such 5x and Sy shall become soon (due to Claim F.l'. The implementation of these updates 


will be described in Appendix G. 
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Similarly, for every e 6 [m], we have Y, P ? , 

\L e *Yk) /q 

by - Y L - = PP T , we have 0 -< P T Y k /q P P al. 

^ Dot (Le.Yfc) 1 /® ' - k - 


P 


Dot {L e ,Y k y/o' 


In addition, denoting 


Proof. We only prove the X k part because the Y k part is similar. We hrst compute 
\\xl /2q L e xl /2q \\l pe < Tr ({xl /2q L e xl /2 y) < Tr(A fe 1 / 2 (L e )«A fe 1/2 ) , 
where the last inequality follows from the Lieb-Thirring trace inequality. 

Next, using the fact that L e P I, we obtain that ( L e ) q P L e . Therefore, 


I xl / 2 , teXl / 2 , \\i P , < THX t 


1/2 


L e X 


1/2n _ 


— Le • X k 


In other words, we have X- 
automatically have 


1/2 q 




p 


P (L e • Xk) l ! q ■ I which means X k P 
because Dot (L e ,X k ) > L e • X k . 


(L e .X k )YY 


We 


(L e .X k )Yi ~ Dot(L e ,.Y fe )V9 

To prove the second half, beginning from X k l ^ q P ^ • PP T , we left multiply it with P T X^J q 
and right multiply it with X X k q P, and obtain P T X^ q P P ^ ■ P T X^J q PP T X^J q P. Denoting by 


D = P T x} ,q P , we have D P — D 2 , which immediately implies 0 P D P al as desired. 


□ 


We have now finished the description of the algorithm. We remark here that TrAfc+i < TrW, 
and TrYfc_|_i > Trl*.. Therefore, since increases as 5x increases, while Trl^+i decreases as 

Sy increase, we conclude the existence of Sx, by > 0 so that TrA^+i, TrY^+i e [ 1,1 + ei]. 


F.3 Our Analysis 

We begin by reproving essentially the first half of Theorem 3.2: that is, to prove (E.3), We need 
to pay extra attention here since our TrX k and TrY*, do not precisely equal to 1. 

Lemma F.2. For every Ux P 0 satisfying Tr Ux < 1, and every Uy P 0 satisfying Tr Uy > 1 + E\, 


/ _ a Le k y 

Dot(L efc , Afc) 1 /? ’ k 

/ __ Yl 

'Dot (L ek ,Y k )^ 


U *) £ + v x„Wx) - 

Uy)<V ft J Y 0 + v y k (Uy)-Vy t J U '-) ■ 


and 


Proof. We hrst prove the X k part. By our choice of the regularize^ we have 


0 = Vw(X k+1 ) - Vw(X k ) +- aLek . 

Dot (L ek ,X k y/ q 


t—i/q 

L fc+1 


'—1/9 

L fc 


—aL , 




Next, we obtain that 


( 


— aL 


Zk 


Dot (L ek ,X k y/ q 


Dot (L ek ,X k y/<i 
, X k - Ux) = (Vw(X k ) - Vw(X k+1 ),X k - Ux) 


= V Xk (Ux) - V^JUx) + V Xt JX k ) 

<VxSVx)-V Xk JUx) + V Xi JX t ) . 
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Above, ® is due to the three-point equality of Bregman divergence, and ® comes from 


Vx k (Ux) - V x - k (Ux) = (X- 1 '" - X~‘"<) . U X + —V-(TrX 


-l/9\ 


1 


r1-1/9 - TrXl~ 1/q ) 




= SxTrUx + 




q - 1 Y Xf 1 (X - ix)°-‘ 

® 1 ® 

< SxTrUx - <)x ^< () • 


Here, © is owing to the definition of Bregman divergence. © comes from the fact that X k l{ q = 

X } H q — 6x1, and the dehnition of choosing A j to be the z-th eigenvalue of X k H q . © follows from 
the convexity of f[x) = which implies f(Xi) — f(Xi — 5x) < V/(Aj)-<5x-. © is by our assumption 
of Trf/x < 1 as well as TrA/t+i = J® -A, > 1. 

Similarly, for the Y k part, we can compute 

/V j j 

L e * 17 - Uy) = (Vw(Y k ) - Vw(Y k+1 ),Y k - U Y ) 

Dot 

= V Yt (Uy)-V ft JUy) + V % JY k ) 

<Vy t [Uy)-Vy t JU Y ) + V fi! JY k ) . 

Above, ® is due to the three-point equality, and inequality ® comes from 

Vy,(Ur) ~ VyJUy) S (yir 1/3 - Y 11 ") * Uy + W-f TrY^ - TrY {- 1/a ) 

< —SyTtUy + Sy fg — ^ • 

i 1 

Here, © is owing to the definition of Bregman divergence. ® comes from the fact that Y k H q = 

Y k ,+ <5yl, and the definition of choosing A,; to be the z-th eigenvalue of © follows from 

the convexity of f(x) = x 1 ~ q which implies /(A,) — /(A* + 6y) < V/(Aj) • (—<5y). © is by our 
assumption of Trt/y > 1 + e\ as well as TrY^yi = jq <l+£l. □ 


In a next step, we reprove essentially the second half of Theorem 3.2: that is, to provide upper 
bounds on (X^) and Vy k { (Y k ) in Lemma F.3 and Lemma F.4. 


Lemma F.3. As long as q > 2 and a < 1/2 q, we have 

V x k J X k) < + 0(qa 3 )) • (L ek . X k ) 1 ~ 1/q . 


Proof. Suppose feot( £ J fc)1/ , -*= Then, 

TrX^J 79 = Tr((A" 1/9 - LAP 71 ) -1 ) 9-1 = 


using the Sherman-Morrison-Woodbury formula, 


rl /<Z _L y 1 /? r>z T _ pT 1 yl/9 p\-l pT yV^A 9 1 


Tr ( X k + X/ /9 P(I - P 1 X k q P'y P X 

■/„ . xpPpxPv-' 


* 1 


Tr( Xr q + 


1 — a 
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where the last inequality follows because (/ — P T X^ q P) 1 ^ yh_/ owing to Claim F.l. as well as 
A ~< B => TrA n < Tr B n . We continue and write 


TrX 


1-1 /q 


k-\-l — 


< Tr ( X'J q + 


l/Q xl /q PP T xl /q y ?-i ( 1 / 20 / xl /2q PP T x'/ 2q , r / 2 q\ 


9-1 


, „ , yl/2<7 p pT y"1/^9 . 

^ 'jy J y(9 _ 1)/^9 _|_ fc _ v fc ^ 9~l^ylg—l)/2q 


= Tr I 


1-1/9 


1 — a 
y 1/29 ppT v 1/^9 

(/+^—f-^ 

v 1 — a 


ry. 


where the inequality uses the Lieb-Thirring trace inequality (which relies on the fact that q — 1 > 1). 

J r yl/29 p pT 

Finally, denoting by D = ——^ we see that 


(/ + £>)* 1 ^/ + (g-l)L> + ( 


(g-l)(g-2) 


a + 0(g 3 a 2 )^-D . 


This above matrix inequality can be proved by first turning into its eigenbasis, and then verifying 
that (1 + x) 9_1 < 1 + (q — l)x + + 0(</ 3 a 2 )x for all x G [0, yz^]- (This uses the fact 

that a < l/2(/). Next, using the above matrix inequality, we conclude that 


TVX 


1 - 1/9 


k +1 — 


<Tr X 


yl/29 p pT y"1/29 

! — !/<?/ p I ^~1 


'(/ + 


< TV X 


fc ' 1 — a 

1 - 1/9 


(jf;- i/ »(/+((,-i) + 


)‘ 

0( ,V)i^ /2,ppT ^ /2a 


-a + 




1 — a 


)) 


l—1/0 / 1 ^9 cy + 0 (q ol ) ^ 

= TVX j J 1/9 + g-1 ---- -X k .PP T . 

1 — a 


Therefore, 


V x k J X 0 = • X “ + Xl TlX 


Q 


1 - 1/9 

fc+i 


Q TrXl~ 1/q 


9-1 


= (XW’ - PP T ) . X k + 


= - pp 1 . x fc + 


1 


- (TVX 


1 

1 - 1/9 

k +1 


—-—TrX^~ 1/9 
q - 1 k 


- TrXl 1 '") 


T ( 1 + + 0(q 2 a 2 ) 

< pp t • X k ( - 1 +----- 

V 1 — a 

= | (a + 0(qa 2 ))-PP T »X k 


<| (a 2 + 0(qa 3 ))-{L ek .X k ) 


1 - 1/9 


□ 


Lemma F.4. As long as q > 2 and a < 1/2(7, we /iat>e 

% +1 (n) < ^(« 2 + 0(a 3 )) • {L ek . n) 1 - 179 . 


Proof. Suppose 


aie 


ffc+ 1 ' "" - 2 

jT 


Dot(L efe ,y fc )i /9 


—- = . Then, using the Sherman-Morrison-Woodbury formula, 


TrT, 


1 - 1/9 _ 


fc+1 


Tr((Y~ 1/q + PP T )~ l ) q 1 = Tr(V fc 1/9 - Y^ /q P(I + P T Y^ /q P)- 1 P T Y^ /q ^j 


9-1 


<T V( yV9_ *k 


Y}J q PP T Yl /q \ 9-1 


1 “h Q 
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where the last inequality follows because (I + P T Y^ q P) 1 + y_I owing to Claim F.l. as well as 
A < B ==> YrA n < Tr B n . We continue and write 


Try, 


1-1 /q 


fc+1 - 


< Tr ( Y,} /q - 


Y,' /q PP T Y' lq \q-i 


1 CX 


k _ k _^ Y ^y 1 


= Tr +«(/- 


1 H - Oi 


yl/2q ppTv^l/2g 

y^~ _ k ^ (g~l)/2g 


= Tr 


(W'M 


1 T cr 

\+/2g p pT~yi/2g 

jfc J 7 M w-i\ 

1 + a ' / ’ 


where the inequality again uses the Lieb-Thirring trace inequality (which relies on the fact that 

j r y^/'Zq p pTy^/^q 

q — 1 > 1). Denoting by D = — - 1+Q , k — + yyM, we see that 


(. I-D) q 1 <I-{q-\)D + 


('q - i)(<? - 2 )a 


D . 


2(1 +a) 

This above matrix inequality can be proved by first turning into its eigenbasis, and then verifying 
that (1 — x) q ~ l < 1 — (q — l)x + D- 1 )^- 2 ) -^-^x for all x € [0, . (This uses the fact that 

a < 1/2 q). Next, using the above matrix inequality, we conclude that 


Try 


/ yl/ 2 9pnTyl/ 2 9 

1—I/? yl — l/?(j _ k k Q 1 


jfc+1 — 

< Tr 


1 + Oi 


-) £ 


= Try 


1_1 / 9 - ( q - 1)(1 - (g ~ 2)a ) — 
1 {q >y 2(1 + 0 +! + 


Therefore, 


W,( 5 W = 5 l+ 1 l / ’* y i + 


r fc+ 1 




2(1 + o) 1 + o 

1-1/9 


y fc . pp 1 


q T tY£~ 1/9 


q- 1 


= (+ 


+pp j 


y fe + 1 Try, 1 /, 1 /<3 - 

g - 1 fc+i q - 1 


9 Tvy fc 1_1/9 


= PP J • Y k + 


<pp T .y(i- 


1-1/9 

fe+i 


yi'v, 

(g- 

2(l+a) 


- nr+- 1/! ) 


_ (g—2)g \ 
\ 2(1+0;)/ 


1 “h Q! 

= |(o + 0(o 2 ))-PP T .y fc 
<|(a 2 + 0 (a 3 ))-(L efc .y fe ) 


1-1/9 


□ 


Theorem F.5. Suppose e < and £1 < \, and we choose a = ^ 7 == and T = + Then, t/ie 
matrix My == Ylk=o y— y Lefc 1/q satisfies that 

\ e k , fcj 

^max (My) - A m i n (My) < A min (My) • ( 

This theorem provides the sparsification guarantee to our Theorem 1 and 2, We shall provide its 
running time guarantee in the next section. 


]J ■ e + °( e i + 'y 2 )) 
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Proof. Define matrices Mj = ElT 

,2 


— and M y d = f E^ 


0 Dot(L e , ,n) 1/9 ' 


0 Dot(L e , j.Yj,) 1 /® 

byr=l(« + 0 (^ 2 ))- 

We are now ready to rededuce (5.3) and (5.4) in Section 5, 

Combining Lemma F.2 and Lemma F.3. and telescoping for k = 0,1,..., T — 1, we have 

V~ (U ) T—l 

VUx p 0 satisfying TrU* = 1, M x • U x < *° + (1 + 0 E ( F e fe • *fc) 1-1/9 


Also, denote 


< 


qn 


a 

V? 


(F.2) 


k =0 
T—l 


(q -!)« 


+ (l+^(fe t »^) W/? 


(F.3) 


k =0 


Above, the second inequality uses the fact that (U x ) < ^-ra 1 / <? . 

Combining Lemma F.2 and Lemma F.4, and telescoping for k = 0,1,..., T — 1, we have 


VUy U 0, TrC/y = 1 + £\, My • Uy > — 


Vy 0 (Uy) 


T—l 


a 


+ (l-{)J2(L ek 'Y k ) 1 


-l/q 


k =0 


T—l 


> - 9(1+£l)nl/g + (i - o £ (U • u-)‘- 1/s 

fc=0 


(9 - l)a 


(F.4) 


Above, the second inequality uses the fact that Vy o (Uy ) < 

Similar to the proof in Section 5. we provide deduce our eigenvalue inequality in two steps. 

Lowerbounding A m j n (Afy). Since we have assumed each TrL e to be at least 1 — £y, we have 

T—l 


Tr (M X ) = Tl ' Ze * 


>-^-E 


1 


to Dot(L ek ,X k )V* + ^ (L ek .X k y/« ' 

Denoting by a k = L ek • X k , we can write Tr (M x ) > Efc=o "W Applying ( F - 2 ) with the 

a k 

choice of U x = = X 0 , we have 

T—l 

T - y, U—V 9 v o _l 


1 £l E u TrM * = ^ < (i+o E&* • ^) 1_1/? < a+o E a ' _1/9 


Using the above inequality we obtain 
T—l 


k =0 


£■ 

k =0 


1-1/9 


> 


1 


T—l 


T—l 


> 


(n(l + £)(1 + £i) l / q (l — ei) x ) fc =0 

T 


171=17 i(ET 1/8 ) 1/s (E 


fc=0 a k 


l/q' 


k =0 


U-l/9 


Tl 1- !/9(1 + £) 1_ !/9(1 + £l )l/9-l/9 2 (l - £l )l/9-l ’ 
where the last inequality follows from Holder’s inequality. If we choose T = this immediately 
gives 

n 1 /9 


T—l 




= £ a!r 1/s 


- £ 2 


-(1 - 0(ga + £i)) . 


(F.5) 


fc =0 fc =0 

Finally, substituting (F.5) into (F.4), and choosing Uy so that My • Uy = (1 + £i)A m j n (My), 
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we have 


(i + El )WMv)>-?4±il^ + (i-0 

(Q ~ !)« 

2qn l t q 


T -1 


(1 + £l) 3 ~ 3 / g “ 0 


>M„.‘W 1 


-1/9 


1 -ri 1 /Q 

- + I 1 -«) (T^jVaTi 7^ (1 - ° ( «“ + El)) 

2qn 1 / q n l / q . 

- _ (^l> + ^ (1_0(ga + ei)) 


n 


1/9 


> —^-(1 - 0(<?a + ei + £ /a)) . 
e z 

Above, the first inequality is due to our choice of e*, which satisfies 

(1 + £i fL ek •Y k > (1 + £i) 2 Dot(L ejt ,y fc ) > Dot(L efc ,X fc ) > L ek • X k 


(F.6) 


(F.7) 


Upper bounding A max (ATy) — A m j n (il4y). This time, combining (F.3) and (F.4). as well as 
using (F.7), we compute that 


1/9 


— —z(MyUx— qU s ) < -AM x .U x - , qn '\ ) 

1 + £ V (g-l)a'“l+£ V ( q-l)a J 


V9 ^ (l+ £l )3-3/9 


(. MyUy+ 


q(l + £\)n 1 / q 


) • 


i - £ r ^ (q- i)« 

Choosing U x so that My • U x = A max (My), and t/y so that My • C/y = (1 + £i)A m j n (My), we can 
rewrite the above inequality as 


, " I/ ’ '' (1 + £l) ” / ' ( H £l )(U«y) + 7 ^-) 


< 


(F.8) 


l-£ v . .. ' (q~lW 

To turn this joint multiplicative-additive error into a purely multiplicative one, we further rewrite 
it as 


^max (My) - A min (My) < 


2£ + O(ei) 1 + £ + 0(£i) qn l / q i qn 1 ^ 


< 


i-e 

2£ + Q(ei) 

i - £ 

2£ + 0(u) 


Amin (My) + 


Amin (My) + 


+ 


l-£ (g-l)a (g-l)a 
2g l + 0(ei)nV« 


<7 — 1 1 — £ a 


2u £ 2 

. t Amin (My) -| - — T ‘ A m in (My) — (1 + 0(qa + £1 + £ 2 /a)) 

1 — £ a 

= A m | n (My) • f qa H-— ——hO(£i + q£ 2 + £i£ 2 /a + £^/ a 2 + ya 2 )') . 

V o — 1 a / 


Above, the second inequality uses (F.6). Now, it is clear that by choosing a = we have 

Amax(My) — A m in(My) < A m in(My) • ( J —■ £ + 0(£l + qC 2 + £\£y/q) 


< 


Amin (My) ' ^ £ ' £ ^ £2 )) ’ 


□ 


F.4 An Additional Property 

Recall that in the previous subsection, we have constructed M x and My and proved that A m i n (My) 
(and in fact A m in(My) as well) is at least Q(n l / q / e 2 ). In this subsection, we shall show that 
Amax(Mx) and A ma x(My) can be made at most Oin 1 ^ / e 2 ) as well. While this additional property 
is not needed for proving Theorem F.5. it shall become useful for proving the desired running time 
in the next section (see Appendix G). 
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The following lemma ensures that if we stop the algorithm “whenever we are done”, and thus 
choose possibly less than n/e 2 matrices, then, A max (Mx) and A max (My) can be properly upper 
bounded. 


Lemma F.6. If one stops the algorithm either when T = ^ iterations are performed, or when the 

first time that Yk=o Dot(L efc , Afe) 1-1 /? > is satisfied, then the same result of Theorem F.5 can 
be obtained, while we have an extra guarantee 

nf ^ q 

Amax (M X ), A max (M y )<0(-^~) . 


Proof. Recall that in the proof of Theorem F.5, we have only used the choice of T = to deduce 
(F.5), For this reason, if instead of choosing exactly T = ^ matrices, we 


T—l 

stop the algorithm at the first time T such that Dot (L ek , > 

k =o 



is satisfied, 


then we automatically have 


T 1 ^ „ l/q 

(i-o( £l )) . 

k =0 

Replacing (F.5) with the above lower bound, all results claimed in Theorem F.5 remain true. 

In the rest of the proof, we will show that this early termination rule ensures a good upper 
bound on A ma x(Mx) and A max (My). Indeed, at the time the algorithm is terminated, we must 
have 

T-i T—i l/q 

<^Dot(L ek ,X k ) l -V« <^ + 0(1) . (F.9) 

fc =0 fc =0 

This is because, since L ek • X k < I • X k = 1 and thus Dot (L ek , Xk) 1 ^ 1 ^ < 0(1), the value 
Dot (L ek , Xk) 1 ^ 1 ^ is incremented by at most 0(1) at each iteration. As a consequence, at 
the first iteration it exceeds n 1//g /e 2 , the summation must be at least nf! q /e 2 + 0(1). 

Next, substituting (F.9) into (F.3). and choosing Ux so that Mx • Ux = A max (Mx), we have 

nr )^-/q 77I/9 77 1/9 

Ama x(M X ) < ^ + (1 + + 0(1) = ■ 

Finally, recalling that we have chosen Dot(L efc ,Xfc) < (1 + £i) 2 Dot (L ek ,Y k ), this ensures that 
(1 + £ 1 ) 2 M x P My. In sum, we obtain that A max (My) < 0(A max (Mx)) < EH 


G Efficient Implementation for Graph Sparsifications 

Recall from Appendix F that in order to implement the algorithm described in Theorem F.5. we 
need to 


(Cl) Ensure that each TrL e is in [1 — £ 1 ,1], 


(C2) Compute at each iteration two reals c x ,c* G M satisfying that TrX k e [1,1 + £ 1 ] and TrY), G 
[1,1 + £ 1 ], where 

k—1 't k —1 


cxL e . 


X k d ^(c x -I-Y- „ 

V Dot{L ej ,Xj) 1/q 


and F fc =(V-^- 

Dot( L ej ,Yj) 1/q 


-c Y I 


-q 
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(C3) Compute at each iteration Dot (L e ,X k ) and Dot (L e ,Y k ) which satisfy 

L e • X k < Dot (L e ,X k ) < (1 + si)L e • X k and L e »Y k < Dot (L e , Y k ) < (1 + si)L e • Y k . 


In this section, we suppose that we are dealing with a spectral graph sparsification instance 

^ i/ 2 ^ i/ 2 

(see Appendix B l. In other words, we use I to denote I\ m (L G )i an d have L e = —— w ° G —, where 
w e = Lq 1 • L e is the effective resistance of edge e £ [m]. 

Knowing this scaling factor w e is somewhat important, because we need to ensure that TrL e is 
between 1 — e k and 1 according to (Cl). Fortunately, Spielman and Srivastava SSI 1] have given 
an algorithm that runs in nearly-linear time, and produces the effective resistances Lff • L e up to 
a multiplicative error of 1 + s\ for all edges e £ [m], with probability at least 1 — 

—. L -1 / 2 LeL,- 1 / 2 

In other words, we can denote by L e = —— w e G —, where each w e only needs to be between 
(1 — S\) ■ L^ 1 • L e and L^} • L e . 

We next wish to show how to implement (C2) and (C3) efficiently. Before that, let us claim 
that 

Lemma G.l. Regardingless of how (C2) and (C3) are implemented, for all iterations, c x ,c* < 

0(a^) = 0(^J). 


Proof. It is first easy to see that c 5 < a ■ A max (My) < 0(a—-p-) owing to Lemma F.6, Next, since 
TrXfc > 1, we must have 


< A 


_ /v max 


aL P 


k -1 

y —^- 

U Dot (L^Xj)'/* 


V? 


+ n 1/q < a ■ A max (Mx) + n 1/q < 0(a-^~) 


□ 


Now, we are ready to prove the main theorem of this section. 


Theorem G.2. In an amortized a running time of 0{^fqn 1 / q m/s\s) per iteration, we can imple¬ 
ment (C2) and (C3) with probability at least 1 — 

Combining this with the fact that there are at mots iterations, the total running time of our 
graph sparsification algorithm is 


O 


yfqn 1+1 / q m\ 

sp ) ■ 


“This amortization can be removed, but will result in a slightly more involved implementation to analyze. 


Our proof below will make frequent uses of Lemma G.3 and Lemma G.4. two independent lemmas 
regarding how to efficiently compute matrix inversions of the form (cl — A)~ q as well as (A — cl)~ q . 
The statements and proofs of these two lemmas are deferred to Appendix G.l. 


Proof. Both (C2) and (C3) are trivially implementable when k = 0, because Xq = Yq = A/. 

Suppose that both of them are implementable at iteration k — 1. We proceed in 4 steps to prove 
that they are implementable at iteration k as well. 


Step I : prove (C3) for computing Dot (L e ,X k 


Suppose X k is given in the form of X k — (c 


,x 


i - Etc 1 


aL e 


-9 


for some c x > 0, 


P=° Dot (Lp^Xjp/i 

and it satisfies TrA^ £ [1,1 + £i], (This is done by the inductive assumption.) 
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Since TrXfc < 1 + e\ < 3/2, we must have 

x~ l/q = c x ■ i-Y] - ^ - y-i . 

^Dot {L e .,XjYN 3 

This inequality ensures that we can compute X k • L e approximately (up to 1 + £i error) using 
Lemma G.3. Since c x is no more than O^n 1 ^ / y/qs) owing to Lemma G.l. the running time 
for computing X k • L e for all edges e E E is 0{c x qmj e\) = 0(y/qn l / q m/ e\e). 


Step II: prove (C3) for computing Dot (L e ,Y k ). 


Suppose Y k is given in the form of Y k = ^ Y,j=o Do t(L y p/q “ c 

c 5 , and it satisfies Tr Y k E [1,1 + £i]. (This is done by the inductive assumption.) Since 
Tr Y k < 1 + E\ < 3/2, we must have 


ale 


y 


I 


-<? 


for some real 


y-' /q = V —“ r " ; - C Y -iy-i . 

^JDot(L ej ,U) 1 /l 3 

This inequality ensures that we can compute V/ • L e approximately (up to 1 + E\ error) using 
Lemma G.4, Since c 5 is no more than 0{n l / q / y/qE) owing to Lemma G.l, the running time 
for computing Y k • L e for all edges e E E is 0(c^ qm/sf ) = 0(y/qn 1 ^ q m/' e\e)- 


Step III: prove (C2) for X k . 

Suppose that X k _i = f ( b x ■ I — 
must have 

= b x ■ i-Y, 


^k -2 aL ej 

1=0 Dot(2 e ,.,^)V9 

k—2 


) Since TrXj._i < 1 + E\ < 3/2, we 


aL e 


/ - - >= -I 

^Dot (Le.,^-) 1 /® 3 


i=o 

Recall that we have proved that X. Y 9 >- -—— 

F fc-i - Do^L^X;) 1 /? 

the inequality above and the fact that a < 1/4, we have 

fc-i 


i.-V 


i-Y __>- -i. 

YoDot (L e .,Xj)V q 2 


(see Claim F.l I, combining it with 

(G.l) 


Now, we are ready to perform a binary search to find c x . If one selects c x = b x , he will 
get TrXfc > TrXfc_i > 1, and therefore c x = b x is a good lower bound for the choice of c x . 
On the other hand, if one selects c x = b x + n 1//g , he will get TYX k < Yi{n 1 ^ q I)~ q = 1, so 
b x + w}/ q is a good upper bound for the choice of c x . 

In sum, we can binary search c x in the interval of [b x ,b x + n 1 ^]. For each such value 
of c x in the process of the binary search, since c x is no more than 0(v}/ q / y/qs) as per 
Lemma G.l. one can apply Lemma G.3 and approximately compute Tr(Xfc) = ^2 e X k • L e 
up to a multiplicative error of 1 + ei, in time 0(c x qm/ e\) = 0{y/qn 1 ^ q m/ e\e). 

Since the overhead for the binary search is 0(1), the total running time to compute c x at an 
iteration is 0(y/qn 1 ^ q m/ e^e). 


• Step IV: prove (C2) for Y k . 
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Suppose that E fc _i d = f ( Z-J Dot( z^- y/q 
have 


— b } • /) q . Since TV Y^_x < 1 + E\ < 3/2, we must 


Y, 


-1/9 


k-1 


k—2 

E 


Oilip. 


U Dot(L ej) ^) 1/9 


- b Y ■ i y U 


(G.2) 


It is clear from now that it suffices for us to search for c* > b Y , because if one selects c 5 = b Y , 
he will get TrY/ < TrV/_i < 1 + £i, and therefore c 5 = is a good lower bound. However, 
unlike Step III, one cannot perform a simple binary search on c 5 because there is no good 
upper bound for c 5 12 

Instead, consider the following increment-and-binary-search algorithm. Beginning from , 
we first choose c 1 = l? + g. This choice of c* ensures that, according to (G.2). 

fc-i 


Y, 


-1/9 


E 


o.Lp 


- c Y -Ih~I 

-Dot (Le^Yj) 1 /* 2 


1 


Therefore, we can compute Tr(Y/) = d e Yj, • L e approximately using Lemma G.4. If the 
approximation computation from Lemma G.4 tells us that Tr(Y/) > 1, we stop the increment 
of c 5 . Otherwise, we conclude that TV(Y/) is still less than or equal to 1 + e\, and continue 
to try c Y = b T + | for z = 2,3,4,.... We stop this increment until we find some integer i so 
that Tr(lfc) > 1. 

At this moment, we have that 


Tr 


aL e , 


k-1 

y—— 

P, Dot(i ej ,U)V» 

fc-1 


~ (bY+ 


and 


Tr 


v a L ej 

pi Dot(£ ej ,yj)V« 


-(6 ’+g)V) >1 


-9 


Therefore, we can perform a binary search for c 5 between b * + i-_! and b Y + | for, and in 
0(1) time we can find some value in this interval which satisfies Tr(Y/) £[1,1 + £i]. 

Again, since we always have c* < 0(n l / q / sjqe) owing to Lemma G.l. the binary search step 
costs a running time that is at most 0(c^ qm/e\) = 0(^qin}/ q m/ e\e) owing to Lemma G.4. 

The incrementation procedure takes a running time 0(y / Zqn 1 ^ q m/ e 2 e) for each increment of 
However, throughout the algorithm, we increment by 1/6 at most 0(n 1//g /y / ge) times 
in total as per Lemma G.l. This running time, after amortization, is going to be dominated 
by that of the binary search. 

Overall, we have shown that (C2) and (C3) can be implemented to run in 0(yJqn l t q m/E 2 E) 
time (in amortization) per iteration. Since there are a total of at most // iterations, the desired 
running time is obtained. d 


G.l Missing Lemmas 

In this subsection, we state and prove Lemma G.3 and Lemma G.4 for the efficient computations 
of the matrix inverses needed for the previous subsection. 


12 In fact, if one is allowed to compute the smallest eigenvalue of Ylj=o Dot(L . < y-) 1 /g ’ can P er l° rm a binary 

search as described in Section 6. However, we have chosen not to implement that algorithm because the running time 
analysis for the max/min eigenvalue computation is only longer than the current one. 
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Lemma G.3. Suppose that we are given positive reals c and so, ■ ■ ■, Sk-i satisfying cI~Yf k j = o SjL ej 
| I, where each L e is the normalized edge Laplacian and k = 0(m). Let q be any positive 
even integer. Then, we can compute a matrix T £ K m,xm in time 0(cqm/e 2 ), where T has 
m / = ©(logra/ef) rows and satisfies that, with probability at least 1 — 

fc-i 

Ve £ E, X •L e < ||Tx e ||| < (1 + £\)X • L e , where X = (cl - ^ SjL e ^ ? . 

3=0 

Lemma G.4. Suppose we are given positive so,, Sk-i and a possibly negative real c satisfying 
that Yl k jZ o s jL e j — cl >z where each L e is the normalized edge Laplacian and k = 0(m). Let 
q be any positive even integer. Then, we can compute a matrix T £ { n time 0(cqm/e\), 

where T has m! = ©(log n/e\) rows and satisfies that, with probability at least 1 — 

fc-i 

Ve £ E, Y • L e < ||Tx e ||| < (1 + £i)Y • L e , where Y = f ^ s jL ej ~ c/j 

3=0 


Our proofs to the above lemmas rely on the following auxiliary tools. 


G.1.1 Auxiliary Tools 

The first one is the famous Laplacian linear system solver, written in the matrix language. 

Theorem G.5. For parameter a £ [0,1]. Given any Laplacian matrix L that corresponds to a 
graph with m edges, there exist an approximation L which satisfies that, with probability at least 
1 — (1 — 5)L -1 <L -< (1 + 5)L~ 1 , and for every vector v £ M n , L v can be computed in 

time 0{m\og(\/5)). 


-r-l 


Proof. The algorithms presented in [ST04] can be expressed as matrices L which satisfy that, 


-r-l 


with high probability, for every x £ M n , the vectors L~ l x and L are close under the so-called L- 
norrn, or in symbols, ||L x — L~ l x\\ 2 L < 5 2 \\L^ 1 x\\‘j j . After expanding this out using the definition 
of the L-norm, we have 


x T (L~ 1 - L~ 1 )L{L~ 1 - L~ l )x < 5 2 ■ x T L~ 1 LL~ 1 x 
(L _1 - L~ 1 )L(Jj~ 1 - L~ l ) P 5 2 ■ L~ l 
(L 1 / 2 I _ 1 L 1/2 - I ) 2 V S 2 I 
-51 P L l/2 L~ l L l/2 -I <51 


(1 — 5)L~ l PL " V (1 + 5)L 


-r-l 


-l 


The running time 0(mlog(l/<5)) follows from that of [ST04], 


□ 


The next two lemmas are the classical results on approximating (I — A) q and {A — I) q using 
Taylor expansions. 

Lemma G.6. The polynomial P(A) = I + A + • • • + A d ~ l satisfies that for all 0 P A P (1 — 5)1, 

OP (I- A)- 1 - P (A) P (1 - 5) d ■ (I - A)- 1 . 

As a consequence, for every integer q > 1, 

(1 - q(l - 5) d ) ■ (I - A)~ q P P q (A) P(I- A)~ q . 
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(G.3) 


Proof. We first note that for every x E [0,1 — <5], we have 

0 < —-(1 + x + • • • + x d ~ l ) =x d + x d+1 + ■■■ = < (1 ~ 6 ^ d . 

1 — X 1 — X 1 — X 

As a consequence, we have that 

0 A (I — A)- 1 - (1 + A + • • • + A d_1 ) A (1 — 6) d • (/ - A)" 1 , 

which can be proved by first assuming (without loss of generality) that A is diagonal, and then 
analyzing each diagonal entry using (G.3). 

To prove the result for (/ — A )~ q , we first notice that (I — A) -1 and P(A) are commutable. 
Therefore, P(A) A (I — A)^ 1 directly implies P q (A) A (/ — A)~ q , which gives one side of the 
inequality. To see the other side, we rewrite 

(l-(l-^)-(I-A)- 1 AP(A) , 

and then take the q -th power on both sides. This yields 

(1 - q( 1 - 5) d ) • (/ - A)~ q A (1 — (1 — S) d ) q • (I - A)~ q A P*(A) , 
which finishes the proof of the lemma. EH 

Lemma G.7. The polynomial P(A) = A + A 2 + • • • + A d satisfies that for all (1 + 6)1 A A, 

0 A (A — I)- 1 - P(A _1 ) A (1 + 6)~ d • (A - I)- 1 . 

As a consequence, for every integer q > 1, 

(1 - q( 1 + 6)~ d ) • (A - I)~ q A P q (A~ l ) A (A- I)~ q . 


Proof. We first note that for every x > 1 + 6, we have 
1 


0 < 


- (x _i + x~' z H-b x ~ d ) = x 


-d -1 


+ X 


—d—2 


+ ••■ = 


1 1 


< 


1 


x d x — 1 (1 + 5) d x — 1 


x — 1 

As a consequence, we have that 

0 A (A — I)" 1 - (A" 1 + A -2 4-b A~ d ) A (1 + <5)" d • (A - I) -1 


. (G.4) 


which can be proved by first assuming (without loss of generality) that A is diagonal, and then 
analyzing each diagonal entry using (G.4). 

To prove the result for (A — I)~ q , we first notice that (A — /) -1 and P(A -1 ) are commutable. 
Therefore, P(A _1 ) A (A — /) -1 directly implies P 9 (A -1 ) A (A — I)~ q , which gives one side of the 
inequality. To see the other side, we rewrite 

(1 - (1 + 6)~ d ) ■ (A - I)” 1 A P(A _1 ) , 


and then take the q -th power on both sides. This yields 

(1 - q( 1 + 6 )~ d ) • (A - I)~ q A (1 - (1 + 6)~ d ) q ■ (A - I)~ q A P q (A~ l ) , 
which finishes the proof of the lemma. 


□ 


G.1.2 Missing Proofs of Lemma G.3 and G.4 

Lemma G.3, Suppose that we are given positive reals c and so,..., Sk-i satisfying cI—Y^jZo s jI J e j t 
\l, where each L e is the normalized edge Laplacian and k = 0{m). Let q he any positive even inte¬ 
ger. Then, we can compute a matrixT E M m ' xm { n time 0(cqm/e\), where T has m! = 0(logn/ef) 
rows and satisfies that, with probability at least 1 — 

k-l 

Ve E E. X • L e < \\T X e\\l < (1 + £i)X • L e , where X = (cl - ^ SjL e .) 9 . 

3=0 
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Proof. Denoting by A = \ Y^=q s jL ej , we have 0 A A A (1 — ^)J by the assumption. Now 
we apply Lemma G.6, and let P(A) be the matrix polynomial of degree d = ®{c\og(q/£\)) from 
Lemma G.6, By the approximation guarantee, we have for every edge e E E, 

k -1 

X.L e = [cl - Y, SjL es ) q • L e = (l ± ■ c~ q • P 9 (A) . L e . (G.5) 

3=0 


Therefore, it suffices for us to compute P 9 (A) • L e for each possible edge e. 


—l 


Next, let Lc ' be the approximation of from Theorem G.5 that satisfies 


( 1 - 


£l 

10 dq 


—l 


)L g 1 ^ Lg " ^ (1 + 


£ 1 

lOdg 


)A 


-i 

G 


Denoting by L s = f yf L ej . we have A = Lq^ 2 L s Lq^ 2 . Accordingly, for every edge e E E, 

P 9 (A) • L e = Tr(p ? (L- 1/2 L s L" 1/2 ) L~ 1/2 L e L~ 1/2 ^j 
= Tr(p^(L5 1 L s ) L^Le) 

= tv(p'?/ 2 (l- 1 l s ) l- 1 p®/ 2 ^^ 1 )^) 

= Tr (p q/2 (L G l L a ) Lf} B t WB t Lf} P q/2 {L s L G 1 )L^j 
= (1 ± £i/10) • Tr(p'^(Z^L.) L^ 1 B T WB T L^~ l P q l 2 (L s L^ l )L e ) 


= (1 ±ei/10) • w e ■ X T e P q/2 (L G 1 L a ) L g 1 B t WB t L g 1 P q/2 (L s L G l ) Xe 

-i 


= (1 ± ei/10) • w e ■ W 1 / 2 B t L g l P q ' 2 (L s L G l ) Xe 


(G.6) 


Above, ® follows because each L G is a (1 ± ygV) approximation to Lq 1 , while we have at most 
(d — 1 )q + 2 < dq copies of Lf} in any sequence of the matrix multiplication on the left hand side 
of ®. 


For this reason, we can preprocess by computing T' '= QW 1 / 2 B T L G P q ! 2 [L s L G ) E M m,xn , 
where Q E M m,xm is some Johnson-Lindenstrauss random matrix with m! = 0(logn/e 2 ) rows. 
This matrix T' satisfies that, with probability at least 1 — 

— “F-1\ 


Ve E E, 


QW X ' 2 B T L G X P q / 2 (L s L G JXe . =(l±£i/10)||T' Xe || 2 


(G.7) 


Combining (G.5). (G.6), and (G.7) together, we have 

Ve E E, X . L e = (1 ± ei /3) • c~ q • w e • ||T' Xe || 2 • 

Defining T = ( —L-y^ ■ c _<? • tCe) 1 ^ • T 7 , we get the desired inequality in Lemma G.3. 

Finally, we emphasize that the above computation of T requires 0(dq ■ m' ■ m) = 0(cqm/e 2 ) 
time. This is because, each row of T can be computed by left multiplying each row of Q with the 
matrix VF 1 / 2 B T L G P q ! 2 (L s L G ) , 13 The running time now follows from (i) we need to compute 
vector-matrix multiplication O(dq) times, which is the power of the polynomial P c,//2 (-), and (ii) 


13 This can be implemented as follows. For any row vector of Q, denote it by u T G R m . We first sequentially 
compute 

. V T U T W l/2 ) 

• v T «— v T B T , and 

• v T t— v T L g 1 ■ i i 

Now, suppose P q P (L s Lg * *) = Yfi=o Ci(L s LG J )* where each d is the coefficient of the i-th power term. We 
continue and compute 

• w T <— 0. 
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Theorem G.5 implies that for inversion v T L G 1 can be computed in time 0{m,\og(dq/E\)) for any 
vector v. EH 

Lemma G.4. Suppose we are given positive so,, Sk~ 1 and a possibly negative real c satisfying 
that 5Ej=o s jLe 3 ~ cl Y 1,1, where each L e is the normalized edge Laplacian and k = 0{m). Let 

q be any positive even integer. Then, we can compute a matrix T E M m xm in time 0{cqm/e\), 
where T has m' = ©(logn/ef) rows and satisfies that, with probability at least 1 — n~^Y), 

fc-i 


Ve E E, Y.L e < \\T X e\\l < (1 + £i )Y • L e 


where Y = f y s jL ej ~ cl 

3=0 


-q 


Proof. There are two cases: c > 0 or c < 0. We begin with the case when c > 0. 

Denoting by A = ^ YljZo s jL e -, we have A Y (1 + ^)I by the assumption. Now we apply 
Lemma G.7, and let P(A) be the matrix polynomial of degree d = 0(clog(g/ei)) from Lemma G.7, 
By the approximation guarantee, we have for every edge e E E, 

k -1 


-<? 


£i 


L e =[i±d yc-q.p^A- 1 ) 


L e 


(G.8) 


Y •L e = ^ ^2 s jL ej - cl J 

3=0 

Therefore, it suffices for us to compute P 9 (A _1 ) • L e for each possible edge e. 

Denoting by L s = f o ^L ej , we have A _1 = Lq 2 Lj 1 Lq 2 . Next, let L s 1 and Lq 1 respec- 


3 

tively be the approximation of Lj 1 and Lffi from Theorem G.5 that satisfy 


-l 


(1 --^) L S - 1 ^ L s A =< (1 + 


10 dq 


ymi'Y ’ “ d 


( 1 - 


10 dq'^ G 

Accordingly, for every edge e E E, 


)L g 1 <L G < (1 + 


10 dq 


)L~ 

)L g 


P q (A 


-i\ 


)L, 


L e = Tr(p q {L 1 J 2 L~ 1 L 1 J 2 ) L~ 1/2 L e L~ 1/2 
= Tr(p «{L-'L G ) L^L e ) 

•(P ^(L-'Lg) Lffi P q ' 2 (L G Lf l ). 

■(P ^{L^Lg) L~ l B t WB t L~ l P q/2 {L G L~ 1 ). 

(1 ± ei/10) • Tr(p ^(L-'Lg) Lg~ 1 B T WB T Lg~ 1 P (LaZT 1 ) L*) 

= (l±£i/10) • w e ■ x T e P ^(L-'Lg) L,g' 1 B t WB t L, g - [ pZ 2 [ LG L s - l ) X e 


= Tr ( 

= Tr( 

® 


\L, 


= (l±£i/10 )-W e 
—1 


W l ' 2 B T L G 1 PZ 2 (L G L S 1 ) X e 


(G.9) 


Above, ® follows because each L s ' (resp. L G *) is a (1± approximation to L s 1 (resp. Lq 1 ), 
while we have at most (d — l)g + 2 < dq copies of Lf 1 and L/fi in any sequence of the matrix 
multiplication on the left hand side of ®. 


• For i 4— 0 to dq/ 2, 

— w T 4— w T + v T . 

— v T 4— v t L s . 

v T LfiZ' 


In the end, the value of the row vector w T is precisely the desired u T W iq2 B T Lo P q ^ 2 [L s Lq ). 
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For this reason, we can preprocess by computing T' '= QW 1 ^ 2 B T L G P q ! 2 {L G L s ) E M m,xn , 
where Q E R m,xm is some Johnson-Lindenstrauss random matrix with m! = ©(log n/e 2 ) rows. 
This matrix T' satisfies that, with probability at least 1 — 

Ve E E, QW l / 2 B T L^ l P q / 2 (L G T- l )xe ' = (1 ± £i/10)||T' Xe ||| . 


(G.10) 


Combining (G.8). (G.9). and (G.10). we have 

Ve E E, Y • L e = (1 ± ei/3) • c~ q ■ w e • ||T'x e ||| • 

Defining T = ( 1 _^ i y 3 • c~ q ■ w e ) 1 ^' • T\ we get the desired inequality in Lemma G.4, 

Finally, we emphasize that the computation of T requires 0(dq ■ m' ■ m) = 0(dqm/e 2 ) time. 
This is because, each row of T can be computed by left multiplying each row of Q with the 
matrix W 1 ^ 2 B T L G P q ^ 2 (LqL s ), 14 The running time now follows from (i) we need to com¬ 
pute vector-matrix multiplication O(dq) times, which is the power of the polynomial P 9//2 (-), 
and (ii) Theorem G.5 implies the inversions v J Lq and v T L s can both be computed in time 
0(m\og{dq/ e i)), for any vector v. 

In the second case, if c < 0, we can write 


Y = - ciy q = (l~ 1/2 (L s - cL G )L~ 1/2 y q . 

3=0 

Therefore, denoting by L' s = L s — cL G , which is another graph Laplacian matrix (with positive 
edge weights), we can write 

Y • le = Tr ( {Lq 1 / 2 L' s Lq 1/2 ) ~ q L- 1/2 L e L~ 1/2 ) 

= Tr \[L' s - l L G )- q/2 L~ c } (L G L'^)~ q/2 L e ) 

= w e ■ xl{L'- 1 L G )~ q/2 Lg 1 B T WBLg 1 (LcL^y^Xe 

= w e -\\W 1 / 2 BL~ 1 (LGL'-y^XeWl • 

It is now clear that similar to the previous case, we can approximately compute L (r 1 and Lq 1 using 
Theorem G.5. and apply the Johnson-Lindenstrauss dimension reduction. We skip the detailed 
proofs here because it is only a repetition. EH 


H Efficient Implementation for Other Problems 

As we have seen in Appendix G, Lemma G.3 and Lemma G.4 are at the core of our efficient 
implementation for the graph sparsification problem. For each other possible sparsification problem, 
as long as these two lemmas can be properly revised, we can also obtain fast running times. Let 
us illustrate how to obtain such running times for two applications below. 

Sparsifying sums of rank-1 matrices. To solve the problem in Theorem 2. it is not hard to 
verify that Lemma G.3 can be revised as follows: 

Suppose that we are given positive reals c and sq, ..., Sk-i satisfying cl — s jL ej E: , 

where each L ej = v ej vj. is an explicit n x n rank-1 matrix and k = 0(m). Let q be any positive 
even integer. Then, we can compute a matrix T E K m,xn in time 0{cqn 2 /e\), where T has m! = 

14 This can be implemented in a similar manner as discussed in Footnote 13. 
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©(logn/e 2 ) rows and satisfies that, with probability at least 1 — n 

fc-i 

Ve€E, X»L e < \\Tv e \\l < (1 +e 1 )X»L e , where X = [cl - ^ Sj L e .) ? . 

3=0 

The key idea for proving the above variant of Lemma G.3 is to note that the matrix inequality 
cl — YljZ o s jL ej t \l implies that the condition number for PSD matrix M = cl — s j^ej is 

at most O(c). Therefore, one can use for instance steepest descent (or even conjugate gradient or 
Chebyshev method) to compute M~ 1 v in time 0(cn 2 ) for every vector v G M n . Next, one can apply 
the similar Johnson-Lindenstrauss dimension reduction as presented in the proof of Lemma G.3, 

A similar variant of Lemma G.4 can be proved similarly. 

In sum, each iteration of our Appendix F is dominated by the computational time need to (1) 
compute the matrix T G M m xn , which takes time 0(cqn 2 /e 2 ) = O (^/qn 2+1 / q / ee 2 ) , and (2) compute 
Tv e for all e G [m], which takes time 0(mn/e 2 ). Taking into account that we have T = n/e 2 such 
iterations, this is a total running time of 

/ ^/qn 3+1 / q mn 2 \ 

V e 2 e 2 + e 2 e 2 ) 


Subgraph sparsification. Given a weighted undirected graph G that can be decomposed into 
edge-disjoint subgraphs, the goal of linear-sized subgraph sparsification is to construct a (l + 0(e))- 
spectral sparsifier G' to G, so that G' consists only of the reweighted versions of at most n/e 2 given 
subgraphs. 

In symbols, suppose that the edges of some weighted undirected graph G of n vertices and 
m! edges are decomposed into a disjoint union E = (+J™ 1 E t . We are interested in finding scalars 
s e > 0 with \{e : s e > 0}| < 0(n/e 2 ) such that, letting L = Y1T=i Se ' Lg[e b ], where Le e is the 
graph Laplacian matrix on the subgraph of G induced by E e , we have Lq ^ L < {1 + s)Lg- 


For this sparsification problem, for each e G [m\, we define L e = 


.- 1 / 2 . 


"G[g e 


- 1/2 


to be the 

normalized subgraph Laplacian scaled by w e . Here, w e is the scaling parameter which ensures that 
TrL e is between 1 — E\ and 1. (It suffices to compute Lfi} • L G [ Ee j up to a multiplicative 1 + E\ 
error, and then assign w e « Lq 1 • L G ^ Ee y) 

For this particular problem, we do not even need to revise Lemma G.3 or Lemma G.4. Recall 
that we only need to compute ‘matrix inversions’ of the form 

k-l 


aL P 


I - V - „ 

U Dot (L ej ,Xjy/ q 


-Q 


Lp 


while each L ej is now —instead of a single (scaled) edge Laplacian matrix— the summation of a 
few (scaled) edge Laplacian matrices. This remains to be the same problem Lemma G.3 is trying 
to implement. The total running time for this subgraph sparsification is therefore 


O 


sjqn 


1+1 /q 


m 




£7£' 
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